# J. Jesus Céron-Rojas José Crossa

# Linear Selection Indices in Modern Plant Breeding

*Foreword by* Daniel Gianola

Linear Selection Indices in Modern Plant Breeding

J. Jesus Céron-Rojas • José Crossa

# Linear Selection Indices in Modern Plant Breeding

Foreword by Daniel Gianola

J. Jesus Céron-Rojas Biometrics and Statistics Unit International Maize and Wheat Improvement Center (CIMMYT) Mexico, Mexico

José Crossa Biometrics and Statistics Unit International Maize and Wheat Improvement Center (CIMMYT) Mexico, Mexico

Chapter 10 was written by Fernando H. Toledo, José Crossa and Juan Burgueño. Chapter 11 was written by Gregorio Alvarado, Angela Pacheco, Sergio Pérez-Elizalde, Juan Burgueño and Francisco M. Rodríguez.

ISBN 978-3-319-91222-6 ISBN 978-3-319-91223-3 (eBook) https://doi.org/10.1007/978-3-319-91223-3

Library of Congress Control Number: 2018942233

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature.

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Newi

### Foreword

Genetic improvement programs of plants and livestock are aimed at maximizing the rate of increase of some merit function (e.g., economic value of a wheat line) that is expected to have a genetic basis. Typically, candidates for selection with the highest merit are kept as parents of the subsequent generation and those with the lowest merit are eliminated ("culled") or used less intensively. There are at least two key questions associated with this endeavor: how merit is defined and how it is assessed.

Merit can be represented by a linear or nonlinear function of genetic values for several traits regarded as important from the perspective of producing economic returns or benefits. The genetic component of merit cannot be observed; thus, it must be inferred from data on the candidates for selection, or on their relatives. Hence, and apart from the issue of specifying economic values (an area requiring expertise beyond animal and plant breeding), the problem of inferring merit is a largely statistical one.

This book represents a substantial compilation of work done in an area known as "selection indices" in animal and plant breeding. Selection indices were originally developed by Smith (1936) in plant breeding and by Hazel (1943) in animal breeding to address the selection of plants or animals scored for multiple attributes. In agriculture, the breeding worth (or net genetic merit) of a candidate for selection depends on several traits. For example, milk production and composition, health, reproductive performance, and life-span in dairy cows; and grain yield, disease resistance, and flowering time in maize. Smith (1936) defined a linear merit function

in which the "merit" (H, say) of a candidate was expressed as H <sup>¼</sup> <sup>X</sup><sup>t</sup> i¼1 wigi, where

<sup>t</sup> is the number of traits, gi is the unobservable additive genetic value (breeding value) of the candidate for trait <sup>i</sup>, and wi is the relative economic value of trait i (calculated externally and taken as a known quantity); in vector notation, H <sup>¼</sup> <sup>w</sup><sup>0</sup> g, where <sup>w</sup> and <sup>g</sup> are t - 1 vectors of relative economic values and breeding values respectively. The preceding definition of H implies that the rate of increase of merit rises by wi units as the breeding value for trait <sup>i</sup> rises by one unit; thus, it is somewhat naïve, as it does not contemplate diminishing returns, nonlinearity, or situations in which the economic return from increasing trait 1, say, depends on the genetic level for trait 2.

The book contains a wealth of material on how various types of linear indices can be constructed, interpreted, optimized, and applied. The techniques described in the book were developed mainly with plant breeding as a focal point, an area in which the authors have wide experience. However, I expect that the book will be of interest to animal breeders as well. The linear selection index (LSI) theory developed in this book is based on the Smith (1936) and Hazel (1943) linear phenotypic selection index (LPSI) (Chap. 2), and all the LSIs described in Chaps. 3–9 are only variants of the LPSI. Thus, in Chap. 3, the author describes null restriction and no null predetermined restriction imposed over the expected genetic gain of the LPSI. In Chap. 4, the authors incorporated molecular marker information into the LPSI, and in Chap. 5 genomic estimated breeding values (GEBVs) are included in the LPSI. Interestingly, Chap. 6 shows how the restrictive LPSI is used in the genomic selection context, but this is based on the LPSI theory of Smith (1936) and Hazel (1943). In Chaps. 7 and 8 the only change was to assume that the economic weights are fixed, but unknown, and then, based on this assumption, the authors demonstrate the eigen selection index method (ESIM) and its variants, which are, of course, associated with the LPSI. In Chap. 9, the reader is shown how to combine the LPSI theory with the independent culling method to develop the multistage selection index theory.

Chapter 10 shows results on stochastic simulations from cycles of selections using the linear phenotypic selection index (LPSI), the ESIM, the restrictive LPSI and the restrictive ESIM. In Chap. 11 the use of RindSel (R software to analyze Selection Indices) is presented with examples for using unrestrictive, restrictive, null or predetermined proportional gain indices.

Animal and plant breeders follow somewhat different routes in the treatment of multiple-trait improvement by selection, mainly because the former field deals with candidates possessing an unequal amount of information, and extensive genetic inter-relatedness. Recently, however, genomic selection has reunified perspectives somewhat. In animal breeding, Henderson (1973) introduced the notion of "best prediction," and showed that the conditional expectation function E(H/DATA), where DATA represents all available records on all traits, unbalanced or not, was the "best predictor" in the sense of the mean squared error. He also showed that the best predictor had some additional properties that were appealing from a response to selection perspective.

In a multiple-trait context and assuming multivariate normality (with known parameters) of the joint distribution of genetic values and DATA, the best predictor retrieves the selection index evaluation derived by Smith (1936) and Hazel (1943) in less general settings (Henderson 1963). It follows immediately that if w is known, the best predictor of merit is

Foreword ix

$$E(\text{H/DATA}) = E(\mathbf{w}' \mathbf{g} / \text{DATA}) = \mathbf{w}' E(\mathbf{g} / \text{DATA})$$

where E(g/DATA) is the best predictor of the breeding values. Smith (1936) and Hazel (1943) failed to recognize that the economic values did not need to enter into the selection index until after the predictions of the breeding values were obtained, simply because of linear invariance. Bulmer (1980) pointed out, pertinently, that it was unclear why ranking animals using a predictor, minimizing the mean squared error of prediction, would maximize expected genetic progress in a single round of selection, and suggested an alternative predictor that was later shown by Gianola and Goffinet (1982) and Fernando and Gianola (1986) to be exactly the best predictor. Animal breeders can perhaps interpret many of the results given in this book from such a perspective.

A more difficult problem (although outside of the scope of the book) is that of inferring nonlinear merit. Suppose now that the merit of a candidate has the form:

$$H = \mathbf{w}'\mathbf{g} + \mathbf{g}'\mathbf{Q}\mathbf{g}$$

where w<sup>0</sup> is a known row vector, as above, and Q is a known matrix, assumed to be symmetric without loss of generality. The conditional distribution of H given DATA does not have a closed form, but it can be estimated using Monte Carlo methods by drawing samples of g from some posterior distribution and, thus, obtaining samples of H from the preceding expression. If <sup>b</sup><sup>g</sup> <sup>¼</sup> Eð Þ <sup>g</sup>=DATA and <sup>C</sup> <sup>¼</sup> Var(g/DATA) are available, the mean and variance of the conditional distribution of H can be calculated analytically, then

$$E(H/\text{DATA}) = \mathbf{w}'\widehat{\mathbf{g}} + \mathbf{g}'\mathbf{Q}\widehat{\mathbf{g}} + tr(\mathbf{Q}\mathbf{C})'$$

and, assuming multivariate normality

$$\begin{array}{c} Var(H/\text{DATA}) = Var(\mathbf{w}' \mathbf{g}) + Var(\mathbf{g}' \mathbf{Q} \mathbf{g}) + 2 \mathbf{w}' Cov(\mathbf{g}, \mathbf{g}' \mathbf{Q} \mathbf{g}) \\ = \mathbf{w}' \mathbf{C} \mathbf{w} + 2tr(\mathbf{Q} \mathbf{C})^2 + 4 \hat{\mathbf{g}}' \mathbf{Q} \mathbf{C} \mathbf{Q} \hat{\mathbf{g}} + 2 \mathbf{w}' \mathbf{C} \mathbf{Q} \hat{\mathbf{g}} \end{array}$$

Contrary to the case of a linear merit function, the precision of the evaluation candidate or, equivalently, the reliability of its evaluation, enters nontrivially when inferring second-order merit. Gianola and Fernando (1986) suggested the Bayesian approach as a general inferential method for solving a large number of animal breeding problems, linear or nonlinear, even in situations where there is uncertainty about all location and dispersion parameters known. Today, the posterior distribution of any nonlinear merit function can be arrived at via Monte Carlo sampling.

Even when the statistical principles are well understood, it is often useful to understand the "architecture" of selection indices. The book is unique in presenting techniques needed to attain such an understanding, and represents a very valuable contribution to the statistical genetics of quantitative traits. It constitutes essential reading for plant quantitative geneticists working in multiple-trait improvement. However, animal breeders will also benefit from studying carefully many of its chapters, as these contribute knowledge in areas of animal breeding research where there has been little traffic. Personally, I am sure that much benefit will be extracted from studying this valuable and novel contribution to the literature.

Daniel Gianola

Department of Animal Sciences, University of Wisconsin, Madison, WI, USA

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA

Department of Dairy Science, University of Wisconsin, Madison, WI, USA

#### References


### Preface

In the linear selection index (LSI) theory, the main distinction is between the net genetic merit and the LSI. The net genetic merit is a linear combination of the true unobservable breeding values of the traits weighted by their respective economic values, whereas the LSI is a linear combination of phenotypic values, marker scores or genomic estimated breeding values (GEBVs). The LSI can also be a linear combination of phenotypic values and marker scores or phenotypic values and GEBVs jointly. That is, the LSI is a function of observed phenotypic values, marker scores, or GEBVs that is used to predict the net genetic merit and select parents for the next generation. Thus, there are three main classes of LSI: phenotypic, marker, and genomic. The main advantage of the genomic LSI over the other indices lies in the possibility of reducing the intervals between selection cycles by more than two thirds. One of the main characteristics of the LSI is that it allows extra merit in one trait to offset slight defects in another. Thus, by its use, individuals with very high merit in one trait are saved for breeding, even when they are inferior in other traits (Hazel and Lush 1942).

Among the LSIs developed up to now, the main distinction is between an LSI that uses economic weights and one that does not use economic weights to predict the net genetic merit. The principal LSI theory was developed assuming that the economic weights are fixed and known; however, recently, the LSI theory was extended to the case where the economic weights are fixed but unknown. This latter theory is more general than the first because it does not require the economic weights to be known. An additional distinction among the LSIs is between the single-stage LSI and the multistage LSI. Multistage LSIs are methods for selecting one or more individual traits available at different times or stages; they are applied mainly in animal and tree breeding where the target traits become evident at different ages. One advantage of the latter method over the single-stage LSI is that the breeder does not need to carry a large population of individuals throughout the multi-trait selection process. Some authors have used multistage LSI as a cost-saving strategy for improving multiple traits, because not all traits need to be measured at each stage. When traits have a developmental sequence in ontogeny, or there are large differences in the costs of measuring several traits, the efficiency of multistage LSI over single-stage LSI can be substantial (Xu and Muir 1991, 1992).

The LSI has two main parameters: the selection response and the expected genetic gain per trait or multi-trait selection response. The selection response is associated with the mean of the net genetic merit and is defined as the mean of the progeny of the selected parents or the future population mean, whereas the expected genetic gain per trait, or multi-trait selection response, is the population means of each trait under selection of the progeny of the selected parents. Thus, although the selection response is associated with the mean of the net genetic merit, the expected genetic gain per trait is associated with the mean of each trait under selection. The selection response and expected genetic gain enable breeders to estimate the expected progress of the selection before carrying it out. This information gives improvement programs a clearer orientation and helps to predict the success of the selection method adopted and to choose the option that is technically most effective on a scientific basis (Costa et al. 2008).

Based on the restriction imposed on the expected genetic gain per trait, the LSIs can be divided into unrestricted, null restricted, or predetermined proportional gains indices. The null restricted LSI allows restrictions equal to zero to be imposed on the expected genetic gain of some traits, whereas the expected genetic gain of other traits increases (or decreases) without imposing any restrictions. In a similar manner, the predetermined proportional gains LSI attempts to make some traits change their expected genetic gain values based on a predetermined level, whereas the rest of the traits remain without restrictions. All the foregoing indices have as their main objectives to predict the net genetic merit and select parents for the next generation.

The LSI theory is based on multivariate normal distribution because this distribution allows the traits under selection to be completely described using only means, variances, and covariances. In addition, if the traits do not correlate, they are independent. Linear combinations of traits are also normal; and even when the trait phenotypic values do not have multivariate normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, in accordance with the central limit theorem, have multivariate normal distribution (Rencher 2002). By this reasoning, a fundamental assumption in the singlestage LSI theory is that the net genetic merit and the LSI have bivariate normal distribution, whereas in the multistage LSI theory, the net genetic merit and the LSIs have multivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic values is linear.

The LSI theory developed in this book was based on the Smith (1936) and Hazel (1943) linear phenotypic selection index (LPSI) described in Chap. 2. As the reader shall see, all the LSIs described in Chaps. 3–9 of this book are only variants of the LPSI. Thus, in Chap. 3, the restricted Kempthorne and Nordskod (1959) index only incorporates null restriction over the LPSI expected genetic gain, and in a similar manner, the Mallard (1972) and Tallis (1985) index incorporates no null predetermined restriction over the LPSI expected genetic gain. In Chap. 4, Lande and Thompson (1990) and Lange and Whittaker (2001) have only incorporated into the LPSI molecular marker information, and in Chap. 5, the authors (Dekkers 2007; Togashi et al. 2011; Ceron-Rojas et al. 2015) incorporated GEBVs into the LPSI. In Chap. 6, the only news is that the Kempthorne and Nordskod (1959) and the Mallard (1972) and Tallis (1985) indices have been used in the genomic selection context, but such indices are based on the LPSI theory of Smith (1936) and Hazel (1943). In Chaps. 7 and 8, the only change was to assume that the economic weights are fixed but unknown, and then, based on this assumption, we have developed the eigen selection index method (ESIM) and its variants, which are, of course, associated with the LPSI. Finally, in Chap. 9 we show that Cochran (1951) and Young (1964) combined the LPSI theory with the independent culling method to develop the multistage selection index theory, but the base theory is the Smith (1936) and Hazel (1943) LPSI theory.

Note that up to now, we have used the acronym LPSI to denote the Smith (1936) and Hazel (1943) index, whereas the rest of the indices have been denoted by the name of their authors. We think that the use of this latter type of notation created confusion in the reader, because it gives the impression that there are many theories associated with the indices or that all the indices were made ad hoc. In reality, there is only one theory, that developed by Smith (1936) and Hazel (1943), whereas the rest of the indices are only variants of this theory. In this book, we intended to solve this problem by using a specific acronym for each index (see Table 1.1, Chap. 1 for details) that indicates the relationship of each index (from Chaps. 3 to 9) with the LPSI. For example, the null restricted Kempthorne and Nordskod (1959) index was denoted by RLPSI (restricted linear phenotypic selection index), whereas the predetermined proportional gain Mallard (1972) and Tallis (1985) index was denoted by PPG-LPSI (predetermined proportional gains linear phenotypic selection index). Similar notation had been used for the molecular and genomic indices (see Table 1.1, Chap. 1 for additional detail). We hope that acronyms such as the RLPSI and PPG-LPSI help the reader to see that the latter two indices are only variants of the LPSI developed by Smith (1936) and Hazel (1943). To be specific, the RLPSI and PPG-LPSI are only projections of the LPSI to a different space. For example, the RLPSI projects the LPSI vector of coefficients to a smaller space than the original space of the LPSI vector of coefficients (see Chap. 3 for details).

The only thing that would be strange for the reader could be the acronyms ESIM (eigen selection index method), RESIM (restricted eigen selection index method), MESIM (molecular eigen selection index method), etc., that we have used in Chaps. 7 and 8, and which would seem to be unrelated to the LPSI, RLPSI, etc. However, we would expect that the context and the theory described in the book indicate to the reader the relationship among all the indices described in the book. As we shall see in Chaps. 7 and 8, ESIM and its variants are the result of a application of the canonical correlation theory to the LPSI context. This is the keyword to understand the ESIM theory.

The main objective of this book is to describe the LSI theory and its statistical properties. First, we describe the single-stage LSI theory by assuming that economic weights are fixed and known to predict the net genetic merit in the phenotypic (Chaps. 2 and 3), marker (Chap. 4), and genomic (Chaps. 5 and 6) contexts. Next, we describe the LSI by assuming that economic weights are fixed but unknown to predict the net genetic merit in the phenotypic (Chap. 7), marker, and genomic (Chap. 8) contexts. In Chap. 9, we describe the multistage LSI in the phenotypic, marker, and genomic contexts assuming that economic weights are fixed and known. Chapters 10 and 11 present simulation results and SAS and R codes respectively to estimate the parameters and make selections using some of the LSIs described in Chaps. 2, 3, 4, 7, and 8.

> J. Jesus Cerón-Rojas José Crossa

#### References


### Acknowledgments

The authors are grateful to the past and present CIMMYT Directors General, Deputy Directors of Research, Directors of Research Programs, and Administration for their continuous and firm support of biometrics and statistics research, training and service in support of CIMMYT's mission: "maize and wheat science for improved livelihoods."

The work described in this book is the result of 30 years of research and data produced and provided by CIMMYT and diverse research partners. We are inspired and grateful for their tremendous commitment to advancing and applying science for public good.

This work was made possible with support from the CGIAR Research Programs on Wheat and Maize (wheat.org, maize.org), and many funders including Australia, United Kingdom (DFID), USA (USAID), South Africa, China, Mexico (SAGARPA), Canada, India, Korea, Norway, Switzerland, France, Japan, New Zealand, Sweden, The World Bank, and the Bill and Melinda Gates Foundation.

Finally, we wish to express our deep thanks to Prof. Dr. Bruce Walsh, who carefully read and corrected this book. We introduced many important suggestions and additions based on Dr. Walsh's extensive scientific knowledge.

### Contents



#### Contents xix





### Chapter 1 General Introduction

Abstract We describe the main characteristics of two approaches to the linear selection indices theory. The first approach is called standard linear selection indices whereas the second of them is called eigen selection index methods. In the first approach, the economic weights are fixed and known, whereas in the second approach the economic weights are fixed but unknown. This is the main difference between both approaches and implies that the eigen selection index methods include to the standard linear selection indices because they do not require that the economic weights be known. Both types of indices predict the net genetic merit and maximize the selection response, and they give the breeder an objective criterion to select individuals as parents for the next selection cycle. In addition, in the prediction they can use phenotypic, markers, and genomic information. In both approaches, the indices can be unrestricted, null restricted or predetermined proportional gains and can be used in the context of single-stage or multistage breeding selection schemes. We describe the main characteristics of the two approaches to the linear selection indices theory and we finish this chapter describing the Lagrange multiplier method, which is the main tool to maximize the selection index responses.

Linear selection indices that assume that economic weights are fixed and known to predict the net genetic merit are based on the linear selection index theory originally developed by Smith (1936), Hazel and Lush (1942), and Hazel (1943). They are called standard linear selection indices in this introduction. Linear selection indices that assume that economic weights are fixed but unknown are based on the linear selection index theory developed by Cerón-Rojas et al. (2008a, 2016) and are called Eigen selection index methods. The Eigen selection index methods include the standard linear selection indices as a particular case because they do not require the economic weights to be known. To understand the Eigen selection index methods theory, the point is to see that this is an application of the canonical correlation theory to the standard linear selection index context. The multistage linear selection index theory will be described only in the context of the standard linear selection indices. As we shall see, there are three main types of LSI: phenotypic, marker, and genomic. Each can be unrestricted, null restricted or predetermined proportional gains and can be used in the context of single-stage or multistage breeding selection schemes.

For each specific selection index described in this book, we have used an acronym. For example, the Smith (1936), Hazel and Lush (1942), and Hazel (1943) index was denoted LPSI (linear phenotypic selection index), whereas the Cerón-Rojas et al. (2008a) index was denoted ESIM (Eigen selection index method), etc. For additional details, see Table 1.1 and the Preface of this book. We think that such notation gives the reader a more general point of view of the relationship that exists among all the indices described in this book.


Table 1.1 Chapter where the index was described, authors who developed the selection index, acronym of the index used in this book, and description of the acronym

(continued)


Table 1.1 (continued)

a Indices that use only phenotypic information

b Indices that use marker and phenotypic information jointly

c Indices that use only genomic information

d Indices that use genomic and phenotypic information jointly in the prediction of the net genetic merit

#### 1.1 Standard Linear Selection Indices

#### 1.1.1 Linear Phenotypic Selection Indices

Three main linear phenotypic selection indices used to predict the net genetic merit and select parents for the next selection cycle are the LPSI, the null restricted LPSI (RLPSI), and the predetermined proportional gains LPSI (PPG-LPSI). The LPSI is an unrestricted index, whereas the RLPSI and the PPG-LPSI allow restrictions to be imposed equal to zero and predetermined proportional gain restrictions respectively, on the trait expected genetic gain per trait values to make some traits change their mean values based on a predetermined level while the rest of the trait means remain without restrictions. All these indices are linear combinations of several observable and optimally weighted phenotypic trait values.

The simplest linear phenotypic selection index (LPSI) can be written as IB = w<sup>0</sup> y, where w is a known vector of economic values and y is a vector of phenotypic values. We called this index the base linear phenotypic selection index (BLPSI). In this case, the breeder does not need to estimate any parameters, and some authors have indicated that the BLPSI is a good predictor of the net genetic merit (H = w<sup>0</sup> g, where g is a vector of true unobservable breeding values) when no data are available for estimating the phenotypic (P) and genotypic (G) covariance matrices. When the traits are independent and the economic weights are also known, the LPSI can be

written as<sup>I</sup> <sup>¼</sup> <sup>X</sup><sup>t</sup> i¼1 wih<sup>2</sup> i yi , and when the economic weights are not known, the LPSI is

<sup>I</sup> <sup>¼</sup> <sup>X</sup><sup>t</sup> i¼1 h2 <sup>i</sup> yi , where wi is the ith economic weight and h<sup>2</sup> <sup>i</sup> is the heritability of trait yi.

In Chap. 2 (Sects. 2.5.1 and 2.5.2), we will show that the foregoing three indices are particular cases of the more general LPSI, i.e., I = b 0 y, where b is the I vector of coefficients and y is the vector of observable trait phenotypic values. In the latter case, we need to estimate matrices P and G.

The LPSI was originally proposed by Smith (1936) in the plant breeding context; later Hazel and Lush (1942) and Hazel (1943) extended the LPSI to the context of animal breeding. These authors made a clear distinction between the LPSI and the net genetic merit. The net genetic merit was defined as a linear combination of the unobservable true breeding values of the traits weighted by their respective economic values. In the LPSI theory, the main assumptions are: the genotypic values that make up the net genetic merit are composed entirely of the additive effects of genes, the LPSI and the net genetic merit have a joint normal distribution, and the regression of the net genetic merit on LPSI values is linear. Two of the main parameters of this index are the selection response and the expected genetic gain per trait or multi-trait selection response. The LPSI selection response is associated with the mean of the net genetic merit and was defined as the mean of the progeny of the selected parents or the mean of the future population (Cochran 1951). The selection response enables breeders to estimate the expected selection progress before carrying it out. This information gives improvement programs a clearer orientation and helps to predict the success of the adopted selection method and choose the option that is technically most effective on a scientific basis (Costa et al. 2008). On the other hand, the LPSI expected genetic gain per trait, or multi-trait selection response, is the population mean of each trait under selection of the progeny of the selected parents. Thus, although the LPSI selection response is associated with the mean of the net genetic merit, the LPSI expected genetic gain per trait is associated with the mean of each trait under selection. The foregoing definition of selection response and the expected genetic gain per trait are valid for all selection indices described in this book.

One of the main problems of the LPSI is that when used to select individuals as parents for the next selection cycle, the expected mean of the traits can increase or decrease in a positive or negative direction without control. This was the main reason why Kempthorne and Nordskog (1959) developed the basics of the restricted LPSI (RLPSI), which allows restrictions to be imposed equal to zero on the expected genetic gain of some traits whereas the expected genetic gain of other traits increases (or decreases) without any restrictions being imposed. Based on the results of the RLPSI, Tallis (1962) and James (1968) proposed a selection index called predetermined proportional gains LPSI (PPG-LPSI), which attempts to make some traits change their expected genetic gain values based on a predetermined level, while the rest of the traits remain without restrictions. Mallard (1972) pointed out that the PPG-LPSI proposed by Tallis (1962) and James (1968) does not provide optimal genetic gains and was the first to propose an optimal PPG-LPSI based on a slight modification of the RLPSI. Other optimal PPG-LPSIs were proposed by Harville (1975) and Tallis (1985). Itoh and Yamada (1987) showed that the Mallard (1972) index is equal to the Tallis (1985) index and that, except for a proportional constant, the Tallis (1985) index is equal to the Harville (1975) index. Thus, in reality, there is only one optimal PPG-LPSI.

In Chap. 3 (Sect. 3.1.1 and 3.2.1), we show that b<sup>R</sup> = Kb and b<sup>P</sup> = KPb are the vectors of coefficients of the RLPSI and PPG-LPSI, respectively, where b is the LPSI vector of coefficients. Matrices K and K<sup>P</sup> are idempotent (K = K<sup>2</sup> and K<sup>P</sup> <sup>¼</sup> <sup>K</sup><sup>2</sup> <sup>P</sup> ), that is, they are projectors. Matrix K projects b into a space smaller than the original space of b because the restrictions imposed on the expected genetic gains per trait are equal to zero (Sect. 3.1.1). The reduction of the space into which matrix K projects b will be equal to the number of null restrictions imposed by the breeder on the expected genetic gain per trait, or multi-trait selection response. In the PPG-LPSI context, matrix K<sup>P</sup> has the same function as K (see Sect. 3.2.1 for details).

The aims of the LPSI, RLPSI, and PPG-LPSI are to:


The LPSI is described in Chap. 2, and the RLPSI and PPG-LPSI are described in Chap. 3. As we will be see in this book, the RLPSI and PPG-LPSI theories can be extended to all selection indices described in this book. Also, the main objectives of all selection indices described in this book are the same as those of the LPSI, RLPSI, and PPG-LPSI.

#### 1.1.2 Linear Marker Selection Indices

The linear marker selection index (LMSI) and the genome-wide LMSI (GW-LMSI) are employed in marker-assisted selection (MAS) and are useful in training populations when there is phenotypic and marker information; both are a direct application of the LPSI theory to the MAS context. The LMSI was originally proposed by Lande and Thompson (1990), and the GW-LMSI was proposed by Lange and Whittaker (2001). The fundamental idea of these authors is based on the fact that crossing two inbred lines generates linkage disequilibrium between markers and quantitative trait loci (QTL), which is useful for identifying markers correlated with the traits of interest and estimating the correlation between each of the selected markers and the trait; the selection criteria are then based upon this marker information (Moreau et al. 2007). The LMSI combines information on markers linked to QTL and the phenotypic values of the traits to predict the net genetic merit of the candidates for selection because it is not possible to identify all QTL affecting the economically important traits (Li 1998). That is, unless all QTL affecting the traits of interest can be identified, phenotypic values should be combined with the marker scores to increase LMSI efficiency (Dekkers and Settar 2004).

Moreau et al. (2000) and Whittaker (2003) found that the LMSI is more effective than LPSI only in early generation testing and that LMSI increased costs because of molecular marker evaluation. The LMSI assumes that favorable alleles are known, as are their average effects on phenotype (Lande and Thompson 1990; Hospital et al. 1997). This assumption is valid for major gene traits but not for quantitative traits that are influenced by the environment and many QTLs with small effects interacting among them and with the environment. The LMSI requires regressing phenotypic values on marker-coded values and, with this information, constructing the marker score for each individual candidate for selection, and then combining the marker score with phenotypic information using the LMSI to obtain a final prediction of the net genetic merit. Several authors (Lange and Whittaker 2001; Meuwissen et al. 2001; Dekkers 2007; Heffner et al. 2009) have criticized the LMSI approach because it makes inefficient use of the available data. It would be preferable to use all the available data in a single step to achieve maximally accurate estimates of marker effects. In addition, because the LMSI is based on only a few large QTL effects, it violates the selection index assumptions of multivariate normality and small changes in allele frequencies.

Lange and Whittaker (2001) proposed the genome-wide LMSI (GW-LMSI) as a possible solution to LMSI problems. The GW-LMSI is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit and select candidates. Both selection indices are described in Chap. 4.

#### 1.1.3 Linear Genomic Selection Indices

The linear genomic selection index (LGSI) is a linear combination of genomic estimated breeding values (GEBVs) and was originally proposed by Togashi et al. (2011); however, Ceron-Rojas et al. (2015) developed the LGSI theory completely. The advantage of the LGSI over the other indices lies in the possibility of reducing the intervals between selection cycles by more than two thirds. A 4-year breeding cycle (including 3 years of field testing) is thus reduced to only 4 months, i.e., the time required to grow and cross a plant. As a result, thousands of candidates for selection can be evaluated without ever taking them out to the field (Lorenz et al. 2011).

In the LGSI, phenotypic and marker data from the training population are fitted in a statistical model to estimate all available marker effects; these estimates are then used to obtain GEBVs that are predictors of breeding values in a testing population for which there is only marker information. The GEBV can be obtained by multiplying the genomic best linear unbiased predictor (GBLUP) of the estimated marker effects in the training population (Van Raden 2008) by the coded marker values obtained in the testing population in each selection cycle. Applying the LGSI in plant or animal breeding requires genotyping the candidates for selection to obtain the GEBV, and predicting and ranking the net genetic merit of the candidates for selection using the LGSI. An additional genomic selection index was given by Dekkers (2007); however, this index can only be used in training populations because GEBV and phenotypic information are jointly used to predict the net genetic merit. Both indices are described in Chap. 5 and in Chap. 6, we describe both indices in the context of the restricted selection indices.

#### 1.2 Eigen Selection Index Methods

The eigen selection index methods are described in Chaps. 7 and 8. As we shall see, these indices are only used in training populations and can be unrestricted, restricted, and predetermined proportional gains selection indices; they can also use phenotypic and/or marker information to predict the net genetic merit. In the context of this linear selection index theory, it is assumed that economic weights are fixed but unknown. The eigen selection index methods is based on the canonical correlation theory and applied to the LPSI, RLSPI, etc., selection indices's context.

#### 1.2.1 Linear Phenotypic Eigen Selection Index Method

Cerón-Rojas and Sahagún-Castellanos (2005) and Cerón-Rojas et al. (2006) proposed a phenotypic selection index in the principal component context that has low accuracy; later, Cerón-Rojas et al. (2008a, 2016) developed the eigen selection index method (ESIM), the restricted ESIM (RESIM) and the predetermined proportional gain ESIM (PPG-ESIM) in the canonical correlations context (Hotelling 1935, 1936). The ESIM is an unrestricted index, but the RESIM and PPG-ESIM allow null and predetermined restrictions respectively to be imposed on the expected genetic gains of some traits, whereas the rest remain without restrictions. The latter three indices use only phenotypic information to predict the individual net genetic merit of the candidate for selection and use the elements of the first eigenvector of the multi-trait heritability as the index vector of coefficients and the first eigenvalue of the multi-trait heritability in their selection response. The main objectives of the three indices are to predict the unobservable net genetic merit values of the candidates for selection, maximize the selection response and the expected genetic gain per trait, and provide the breeder with an objective rule for evaluating and selecting several traits simultaneously. Their main characteristics are:


Finally, the main theory describe in Chapter 7 was developed by Cerón-Rojas et al.(2008a, 2016) based on the canonical correlation framework. That is, ESIM and its variants (RESIM, MESIM, PPG-ESIM) are applications of the canonical correlation theory to the LPSI context.

#### 1.2.2 Linear Marker and Genomic Eigen Selection Index Methods

Cerón-Rojas et al. (2008b) and Crossa and Cerón-Rojas (2011) extended the ESIM to a molecular ESIM (MESIM) and to a genome-wide ESIM (GW-ESIM), respectively, similar to the linear molecular selection index (LMSI) and to the genomewide LMSI (GW-LMSI). The MESIM and GW-ESIM have problems similar to those associated with the LMSI and GW-LMSI respectively (Chap. 4 for details). The MESIM and GW-ESIM use phenotypic information and markers linked to QTL to predict the net genetic merit, but the GW-ESIM omits the molecular selection step in the prediction. The main difference among the MESIM, the GW-ESIM, the LMSI, and the GW-LMSI is how they obtain the vector of coefficients: while the LMSI and GW-LMSI obtain the vector of coefficients according to the LPSI theory, the MESIM and the GW-ESIM obtain the vector of coefficients based on canonical correlation analysis and the singular value decomposition theory.

It is possible to extend the ESIM to a genomic ESIM (GESIM), and the restricted RESIM and the PPG-ESIM can be extended to a restricted genomic ESIM (RGESIM) and to a predetermined proportional gain genomic ESIM (PPG-GESIM) that use phenotypic and GEBV information jointly to predict the net genetic merit of the candidates for selection, maximizing the selection response and optimizing the expected genetic gain per trait; but although the GESIM is not constrained, the RGESIM and the PPG-GESIM allow null and predetermined restrictions respectively to be imposed on the expected genetic gain to make some traits change their mean values based on a predetermined level, while the rest of the traits remain without any restriction.

#### 1.3 Multistage Linear Selection Indices

Multistage linear selection indices are methods of selecting one or more individual traits available at different times or stages and are applied mainly in animals and tree breeding where the traits under consideration become evident at different ages. The theory of these indices is based on the independent culling level method and the standard linear selection index theory. There are two main approaches associated with these indices:


These indices can use phenotypic or GEBV information to predict the net genetic merit or combine phenotypic and GEBV in the prediction. These indices can also be unrestricted, null restricted or predetermined proportional gains. In this book, we describe only the optimal multistage linear selection index in Chap. 9 and, in this book, it is called simply multistage linear selection index.

Multistage linear selection indices are a cost-saving strategy for improving multiple traits, because not all traits need to be measured at each stage. Thus, when traits have a developmental sequence in ontogeny or there are large differences in the costs of measuring several traits, the efficiency of this index over LPSI efficiency can be substantial (Xu et al. 1995). Xu and Muir (1992) have indicated that the optimal multistage linear phenotypic selection index (MLPSI) increases selection intensity on traits measured at an earlier age, and, with fixed facilities, a greater number of individuals can be selected at an earlier age. For example, if some individuals can be culled before final traits are measured (e.g., weaning weights in swine and beef cattle breeding), savings are realized in terms of feed, labor, and facilities. With the LPSI, the same individuals must be measured for each trait; thus, the number of traits measured per mature individual is the same as that for an immature individual.

The original MLPSI was developed by Cochran (1951) in the two-stage context and later, Young (1964) and Cunningham (1975) combined the LPSI theory with the independent culling method to simultaneously select more than one trait in the multistage selection context. This selection method was called multistage selection by Cochran (1951) and Young (1964) and multistage index selection by Cunningham (1975).

The MLPSI theory can also be adapted to the genomic selection context, where it is possible to develop an optimal multistage unrestricted, restricted, and predetermined proportional gains linear genomic selection index. The latter indices are linear combinations of estimated breeding values (GEBV) used to predict the individual net genetic merit and select individual traits available at different stages in a non-phenotyped testing population and are called multistage linear genomic selection indices. The advantage of these indices over the other selection indices lies in the possibility of reducing the intervals between selection cycles or stages by more than two thirds.

One of the main problems of all the multistage selection indices is that after the first selection stage their values could be non-normally distributed. In addition, for more than two stages, those indices require computationally sophisticated multiple integration techniques to derive selection intensities, and there are problems of convergence when the traits and the index values of successive stages are highly correlated. Furthermore, the computational time could be unacceptable if the number of selection stages becomes too high (Börner and Reinsch 2012). One possible solution to these problems was given by Xu and Muir (1992) in the selection index updating or decorrelated multistage linear phenotypic selection index context. However, one problem with the decorrelated multistage selection index is that its accuracy and selection response is generally lower than the accuracy and selection response of the multistage selection index described in this book.

#### 1.4 Stochastic Simulation of Four Linear Phenotypic Selection Indices

Chapter 10 describes a stochastic simulation of four linear indices: LPSI, ESIM, RLPSI, and RESIM. We think that stochastic simulation can contribute to a better understanding of the relationship between these indices and their accuracies to predict the net genetic merit.

#### 1.5 RIndSel: Selection Indices with R

Chapter 11 describes how RIndSel can be used to determine individual candidates as parents for the next cycle of improvement. RIndSel is a graphical unit interface that uses the selection index theory to make selection. The index can be a linear combination of phenotypic values, genomic estimated breeding values or a linear combination of phenotypic values and marker scores.

#### 1.6 The Lagrange Multiplier Method

To obtain the constrained linear selection indices (e.g., RLPSI, PPG-LPSI, RESIM) described in Chaps. 3, 6, 7, 8, and 9, we used the method of Lagrange multipliers. This is a powerful method for finding extreme values (maxima or minima) of constrained functions. For example, the covariance between the breeding value vector (g) and the LPSI (I = b 0 y) is Cov(I, g) = Gb. In the LPSI context, the Gb vector can take any value (positive or negative) which could be a problem for some breeding objectives. That is, the breeder could be interested in improving only (t r) of t (r < t) traits, leaving r of them fixed; that is, the expected genetic gains of r traits will be equal to zero for a specific selection cycle. In such cases, we want r covariances between the linear combinations of g (U<sup>0</sup> g) and the I = b 0 y to be zero, i.e., Cov(I, U<sup>0</sup> g) = U<sup>0</sup> Gb = 0, where U<sup>0</sup> is a matrix with r 1's and (t r) 0's; 1 indicates that the trait is restricted and 0 that the trait is not restricted. This is the main problem of the RLPSI, and the method of Lagrange multipliers is useful for solving that problem.

In the constrained linear selection indices context, the method of Lagrange multipliers involves maximizing (or minimizing) the Lagrange function: L[H,I, g, v] = f(H, I) + v 0 g(g,I), where the elements of vector v 0 are called Lagrange multipliers. In the RLPSI context, f(H,I) = E[(H - I) 2 ] = w<sup>0</sup> Gw + b 0 Pb - 2w<sup>0</sup> Gb is the mean squared difference between I and H. Let g(g,I) = Cov(I, U<sup>0</sup> g) = U<sup>0</sup> Gb be the covariances between the linear combinations of g (U<sup>0</sup> g), and I = b 0 y, the LPSI. Then, to find the RLPSI vector of coefficients b<sup>R</sup> = Kb, we need to minimize the Lagrange function: b 0 Pb + w<sup>0</sup> Gw - 2w 0 Gb + 2v 0 C0 b, with respect to vectors b and v 0 = [v<sup>1</sup> v<sup>2</sup> vr - 1], where v is a vector of Lagrange multipliers (see Chap. 3, Sect. 3.1.1 for details). Schott (2005) has given additional details associated with the method of Lagrange multipliers.

#### References


Young SSY (1964) Multi-stage selection for genetic gain. Heredity 19:131–143

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 2 The Linear Phenotypic Selection Index Theory

Abstract The main distinction in the linear phenotypic selection index (LPSI) theory is between the net genetic merit and the LPSI. The net genetic merit is a linear combination of the true unobservable breeding values of the traits weighted by their respective economic values, whereas the LPSI is a linear combination of several observable and optimally weighted phenotypic trait values. It is assumed that the net genetic merit and the LPSI have bivariate normal distribution; thus, the regression of the net genetic merit on the LPSI is linear. The aims of the LPSI theory are to predict the net genetic merit, maximize the selection response and the expected genetic gains per trait (or multi-trait selection response), and provide the breeder with an objective rule for evaluating and selecting parents for the next selection cycle based on several traits. The selection response is the mean of the progeny of the selected parents, whereas the expected genetic gain per trait, or multi-trait selection response, is the population means of each trait under selection of the progeny of the selected parents. The LPSI allows extra merit in one trait to offset slight defects in another; thus, with its use, individuals with very high merit in one trait are saved for breeding even when they are slightly inferior in other traits. This chapter describes the LPSI theory and practice. We illustrate the theoretical results of the LPSI using real and simulated data. We end this chapter with a brief description of the quadratic selection index and its relationship with the LPSI.

#### 2.1 Bases for Construction of the Linear Phenotypic Selection Index

The study of quantitative traits (QTs) in plants and animals is based on the mean and variance of phenotypic values of QTs. Quantitative traits are phenotypic expressions of plant and animal characteristics that show continuous variability and are the result of many gene effects interacting among them and with the environment. That is, QTs are the result of unobservable gene effects distributed across plant or animal genomes that interact among themselves and with the environment to produce the observable characteristic plant and animal phenotypes (Mather and Jinks 1971; Falconer and Mackay 1996).

Fig. 2.1 Distribution of 252 phenotypic means of two maize (Zea mays) F2 population traits: plant height (PHT, cm; a) and ear height (EHT, cm; b), evaluated in one environment, and of 599 phenotypic means of the grain yield (GY1 and GY2, ton ha- ; c and d respectively) of one double haploid wheat (Triticum aestivum L.) population evaluated in two environments

The QTs are the traits that concern plant and animal breeders the most. They are particularly difficult to analyze because heritable variations of QTs are masked by larger nonheritable variations that make it difficult to determine the genotypic values of individual plants or animals (Smith 1936). However, as QTs usually have normal distribution (Fig. 2.1), it is possible to apply normal distribution theory when analyzing this type of data.

Any phenotypic value of QTs ( y) can be divided into two main parts: one related to the genes and the interactions (g) among them (called genotype), and the other related to the environmental conditions (e) that affect genetic expression (called environment effects). Thus, the genotype is the particular assemblage of genes possessed by the plant or animal, whereas the environment consists of all the nongenetic circumstances that influence the phenotypic value of the plant or animal (Cochran 1951; Bulmer 1980; Falconer and Mackay 1996). In the context of only one environment, the phenotypic value of QTs ( y) can be written as

$$\mathbf{y} = \mathbf{g} + \mathbf{e},\tag{2.1}$$

where g denotes the genotypic values that include all types of gene and interaction values, and e denotes the deviations from the mean of g values. For two or more environments, Eq. (2.1) can be written as <sup>y</sup> ¼ <sup>g</sup> <sup>+</sup> <sup>e</sup> <sup>+</sup> ge, where ge denotes the interaction between genotype and environment. Assumptions regarding Eq. (2.1) are:


The g value can be partitioned into three additional components: additive genetic (a) effects (or intra-locus additive allelic interaction), dominant genetic (d) effects (or intra-locus dominance allelic interaction), and epistasis (ι) effects (or inter-loci allelic interaction) such that <sup>g</sup> ¼ <sup>a</sup> <sup>+</sup> <sup>d</sup> <sup>+</sup> <sup>ι</sup>. In this book, we have assumed that <sup>g</sup> ¼ <sup>a</sup>.

According to Kempthorne and Nordskog (1959), the following four theoretical conditions are necessary to construct a valid LPSI:


Under assumptions 1 to 4, the offspring of a mating will have a genotypic value equal to the average of the breeding values of the parents (Kempthorne and Nordskog 1959). Additional conditions for practical objectives are:


Conditions 5 to 10 indicate that the LPSI is applying in a single stage context.

#### 2.2 The Net Genetic Merit and the LPSI

Not all the individual traits under selection are equally important from an economic perspective; thus, the economic value of a trait determines how important that trait is for selection. Economic value is defined as the increase in profit achieved by improving a particular trait by one unit (Tomar 1983; Cartuche et al. 2014). This means that for several traits, the total economic value is a linear combination of the breeding values of the traits weighted by their respective economic values (Smith 1936; Hazel and Lush 1942; Hazel 1943; Kempthorne and Nordskog 1959); this is called the net genetic merit of one individual and can be written as

$$H = \mathbf{w}'\mathbf{g},\tag{2.2}$$

where <sup>g</sup><sup>0</sup> <sup>¼</sup> [g<sup>1</sup> <sup>g</sup><sup>2</sup> ... gt] is a vector of true unobservable breeding values and <sup>w</sup><sup>0</sup> <sup>¼</sup> <sup>w</sup><sup>1</sup> <sup>w</sup><sup>2</sup> ... wt ½ is a vector of known and fixed economic weights. Equation (2.2) has several names, e.g., linear aggregate genotype (Hazel 1943), genotypic economic value (Kempthorne and Nordskog 1959), net genetic merit (Akbar et al. 1984; Cotterill and Jackson 1985), breeding objective (Mac Neil et al. 1997), and total economic merit (Cunningham and Tauebert 2009), among others. In this book, we call Eq. (2.2) net genetic merit only. The values of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g are unobservable but they can be simulated for specific studies, as is seen in the examples included in this chapter and in Chap. 10, where four indices have been simulated for many selection cycles.

In practice, the net genetic merit of an individual is not observable; thus, to select an individual as parent of the next generation, it is necessary to consider its overall merit based on several observable traits; that is, we need to construct an LPSI of observable phenotypic values such that the correlation between the LPSI and <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> is at a maximum. The LPSI should be a good predictor of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g and should be useful for ranking and selecting among individuals with different net genetic merits. The LPSI for one individual can be written as

$$I = \mathbf{b}'\mathbf{y},\tag{2.3}$$

where <sup>b</sup><sup>0</sup> <sup>¼</sup> <sup>b</sup><sup>1</sup> <sup>b</sup><sup>2</sup> bt ½ is the <sup>I</sup> vector of coefficients, <sup>t</sup> is the number of traits on <sup>I</sup>, and <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> yt ½ is a vector of observable trait phenotypic values usually centered with respect to its mean. The LPSI allows extra merit in one trait to offset slight defects in another. With its use, individuals with very high merit in some traits are saved for breeding, even when they are slightly inferior in other traits (Hazel and Lush 1942). Only one combination of b values allows the correlation of the LPSI with <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g for a particular set of traits to be maximized.

Figure 2.2 indicates that the regression of the net genetic merit on the LPSI is lineal and that the correlation between the LPSI and the net genetic merit is maximal in each selection cycle. Also, note that the true correlations between the LPSI and the net genetic merit, and the true regression coefficients of the net genetic merit over the LPSI are the same, but the estimated correlation values between the LPSI and the net genetic merit are lower than the true correlation (Fig. 2.2). Table 2.1 indicates that the LPSI in the ith selection cycle and the LPSI in the (i + 1)th selection cycle do not correlate. However, in practice, the correlation values between any pair of LPSIs could be different from zero in successive selection cycles.

One fundamental assumption of the LPSI is that <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y has normal distribution. This assumption is illustrated in Fig. 2.3 for two real datasets: a maize (Zea mays) F2 population with 252 lines and three traits—grain yield (ton ha-1 ); plant height (cm) and ear height (cm)—evaluated in one environment; and a double haploid wheat (Triticum aestivum L.) population with 599 lines and one trait—grain yield (ton ha-1 )—evaluated in three environments. Figure 2.3 indicates that, in effect, the LPSI values approach normal distribution when the number of lines is very large.

Fig. 2.2 True correlation (TC) and estimated correlation (ECO) values between the linear phenotypic selection index (LPSI) and the net genetic merit for seven selection cycles, and true regression coefficient (TRC) of the net genetic merit over the LPSI for four traits and 500 genotypes in one environment simulated for seven selection cycles


#### 2.3 Fundamental Parameters of the LPSI

Table 2.1 Estimated correlation values between the linear phenotypic selection index (LPSI) values in seven simulated selection cycles

There are two fundamental parameters associated with the LPSI theory: the selection response (R) and the expected genetic gain per trait (E). In general terms, the selection response is the difference between the mean phenotypic values of the offspring (μO) of the selected parents and the mean of the entire parental generation (μP) before selection, i.e., <sup>R</sup> <sup>¼</sup> <sup>μ</sup><sup>O</sup> μ<sup>P</sup> (Hazel and Lush 1942; Falconer and Mackay 1996). The expected genetic gain per trait (or multi-trait selection response) is the covariance between the breeding value vector and the LPSI (I) values weighted by the standard deviation of the variance of I(σI), i.e., Cov Ið Þ ;<sup>g</sup> <sup>σ</sup><sup>I</sup> <sup>¼</sup> Gb σI , multiplied by the

Fig. 2.3 Maize LPSI (Fig. 2.3a) is the distribution of 252 values of the LPSI constructed with the phenotypic means of three maize (Zea mays) F2 population traits: grain yield (ton ha-1 ), PHT (cm) and EHT (cm), evaluated in one environment. Wheat LPSI (Fig. 2.3b) is the distribution of 599 LPSI values constructed with the phenotypic means of the grain yield (ton ha-1 ) of a double haploid wheat (Triticum aestivum L.) population evaluated in three environments

selection intensity. This is one form of the LPSI multi-trait selection response. In the univariate context, the expected genetic gain per trait is the same as the selection response.

One additional way of defining the selection response is based on the selection differential (D). The selection differential is the mean phenotypic value of the individuals selected as parents (μS) expressed as a deviation from the population mean (μP) or parental generation before the selection was made (Falconer and Mackay 1996); that is, <sup>D</sup> <sup>¼</sup> <sup>μ</sup><sup>S</sup> μP. Thus, another way of defining R is as the part of the expected differential of selection (<sup>D</sup> <sup>¼</sup> <sup>μ</sup><sup>S</sup> μP) that is gained when selection is applied (Kempthorne and Nordskog 1959); that is

$$R = \frac{Cov(\mathbf{g}, \mathbf{y})}{\sigma\_{\mathbf{y}}^2} D = k \sigma\_{\mathbf{y}} h^2,\tag{2.4}$$

where Cov gð Þ¼ ; <sup>y</sup> <sup>σ</sup><sup>2</sup> <sup>g</sup> is the covariance between g and y, g is the individual breeding value associated with trait y, σ<sup>2</sup> <sup>y</sup> is the variance of <sup>y</sup>, <sup>k</sup> <sup>¼</sup> <sup>D</sup> <sup>σ</sup><sup>y</sup> is the standardized selection differential or selection intensity, and <sup>h</sup><sup>2</sup> ¼ <sup>σ</sup><sup>2</sup> g σ2 y is the heritability of trait y in the base population. Heritability (h<sup>2</sup> ) appears in Eq. (2.4) as a measure of the accuracy with which animals or plants having the highest genetic values can be chosen by selecting directly for phenotype (Hazel and Lush 1942).

The selection response (Eq. 2.4) is the mean of the progeny of the selected parents or the future population mean of the trait under selection (Cochran 1951). Thus, the selection response enables breeders to estimate the expected progress of the selection before carrying it out. This information gives improvement programs a clearer orientation and helps to predict the success of the selection method adopted and choose the option that is technically most effective on a scientific base (Costa et al. 2008). Equation (2.4) is very powerful but its application requires strong assumptions. For example, Eq. (2.4) assumes that the trait of interest does not correlate with other traits having causal effects on fitness and, in its multivariate form the validity of predicted change rests on the assumption that all such correlated traits have been measured and incorporated into the analysis (Morrissey et al. 2010).

#### 2.3.1 The LPSI Selection Response

The univariate selection response (Eq. 2.4) can also be rewritten as

$$R = k \sigma\_\text{y} h^2 = k \sigma\_\text{g} \rho\_\text{gy},\tag{2.5}$$

where σ<sup>g</sup> was defined in Eq. (2.4) and ρgy is the correlation between g and y. Thus, as <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y are univariate random variables, the selection response of the LPSI (RI) can be written in a similar form as Eq. (2.5), i.e.,

$$R\_I = k\_I \sigma\_H \rho\_{HI},\tag{2.6}$$

where <sup>σ</sup><sup>H</sup> and <sup>σ</sup><sup>I</sup> are the standard deviation and <sup>ρ</sup>HI the correlation between <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> <sup>y</sup> respectively; kI <sup>¼</sup> <sup>μ</sup>IAμIB <sup>σ</sup><sup>I</sup> is the standardized selection differential or the selection intensity associated with the LPSI; μIA and μIB are the means of the LPSI values after and before selection respectively. The second part of Eq. (2.6) (kIσHρHI) indicates that the genetic change due to selection is proportional to kI, σH, and ρHI (Kempthorne and Nordskog 1959). Thus, the genetic gain that can be achieved by selecting for several traits simultaneously within a population of animals or plants is the product of the selection differential (kI), the standard deviation of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g (σH), and the correlation between <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> p (ρHI). Selection intensity kI is limited by the rate of reproduction of each species, whereas σ<sup>H</sup> is relatively beyond man's control; hence, the greatest opportunity for increasing selection progress is by ensuring that ρHI is as large as possible (Hazel 1943). In general, it is assumed that kI and σ<sup>H</sup> are fixed and w known and fixed; hence, RI is maximized when ρHI is maximized only with respect to the LPSI vector of coefficients b.

Equation (2.6) is the mean of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g, whereas σ<sup>2</sup> Hρ<sup>2</sup> HIð Þ <sup>1</sup> v is its variance and ρ∗ HI <sup>¼</sup> <sup>ρ</sup>HI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 v 1 vρ<sup>2</sup> HI <sup>s</sup> the correlation between <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> p after selection was carried out (Cochran 1951), where <sup>v</sup> <sup>¼</sup> kI(kI τ) and τ is the truncation point.

For example, if the selection intensity is 5%, kI <sup>¼</sup> 2.063, <sup>τ</sup> <sup>¼</sup> 1.645, and <sup>v</sup> <sup>¼</sup> 0.862 (Falconer and Mackay 1996, Table A). In R (in this case R denotes a platform for data analysis, see Kabakoff 2011 for details), the truncation point and selection intensity can be obtained as <sup>v</sup> <sup>&</sup>lt; qnorm(1 q) and <sup>k</sup> <sup>&</sup>lt; dnorm(v)/q, respectively, where q is the proportion retained. Both the variance and the correlation (ρ<sup>∗</sup> HI ) are reduced by selection. If <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> could be selected directly, the gain in <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g would be kI. Thus, the gain due to indirect selection using <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> p is a fraction ρHI of that due to direct selection using <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup>. As kI increases, RI increases (Eq. 2.6), ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi σ2 Hρ<sup>2</sup> HIð Þ <sup>1</sup> v q and ρ<sup>∗</sup> HI decrease, and the effects are in the same direction as ρ<sup>∗</sup> HI increases (Cochran 1951). These results should be valid for all selection indices described in this book.

Smith (1936) gave an additional method to obtain Eq. (2.6). Suppose that we have a large number of plant lines and we select one proportion q for further propagation. In addition, assume that the values of I for each line are normally distributed with variance σ2 <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Pb; let I be transformed into a variable u, with unit variance and mean at zero, that is, <sup>u</sup> ¼ <sup>I</sup>μI <sup>σ</sup><sup>I</sup> , where μ<sup>I</sup> is the mean of I. Assume that all I values higher than I <sup>0</sup> value are selected; then the value of <sup>u</sup><sup>0</sup> ¼ <sup>I</sup><sup>0</sup> μI <sup>σ</sup><sup>I</sup> corresponding to any given value of q may be ascertained from a table of the standard normal probability integral (Fig. 2.4).

Assuming that the expectations of <sup>H</sup> and <sup>I</sup> are <sup>E</sup>(H) <sup>¼</sup> 0 and <sup>E</sup>(I) <sup>¼</sup> <sup>μ</sup>I, the conditional expectation of H given I can be written as

Fig. 2.4 Graph of standardized LPSI values showing how a population can be separated sharply at a given point (u0 ) into a selected fraction (q), denoted by the red area, and a remainder that is culled, denoted by the white area

E Hð Þ¼ <sup>=</sup><sup>I</sup> <sup>σ</sup>HI σ2 I I <sup>μ</sup><sup>I</sup> ½ ¼ <sup>σ</sup>HI σ2 I <sup>σ</sup>Iu ¼ <sup>B</sup>σIu, where<sup>B</sup> ¼ <sup>σ</sup>HI σ2 I , <sup>σ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> Gb is the covariance between H and I, and σ<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Pb is the variance of I. Therefore, if σ<sup>2</sup> <sup>I</sup> and σHI are fixed, the LPSI selection response (RI) can be obtained as the expectation of the selected population, which has univariate left truncated normal distribution. A truncated distribution is a conditional distribution resulting when the domain of the parent distribution is restricted to a smaller region (Hattaway 2010). In the LPSI context, a truncation distribution occurs when a sample of individuals from the parent distribution is selected as parents for the next selection cycle, thus creating a new population of individuals that follow a truncated normal distribution. Thus, we need to find <sup>E</sup>[E(H/I)] ¼ <sup>q</sup>-1 BσIE(u), or, using integral calculus,

$$E[E(H/I)] = \frac{B\sigma\_I}{q} \int\_{u=u'}^{\infty} \frac{u}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2}u^2\right\} du = \frac{z}{q} \sigma\_H \rho\_{Hl},\tag{2.7}$$

where <sup>z</sup> ¼ exp -<sup>0</sup>:5u0<sup>2</sup> f g ffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> is the height of the ordinate of the normal curve at the lowest value of u<sup>0</sup> retained and q is the proportion of the population of animal or plant lines that is selected (Fig. 2.4). The proportion q that must be saved depends on the reproductive rate and longevity of the species under consideration and on whether the population is expanding, stationary or declining in numbers. The ordinate (z) of the normal curve is determined by the proportion selected (q) (Fig. 2.4). The amount of progress is expected to be larger as q becomes smaller; that is, as selection becomes more intense (Hazel and Lush 1942). Kempthorne and Nordskog (1959) showed that <sup>z</sup> <sup>q</sup> <sup>¼</sup> kI. Thus, Eqs. (2.6) and (2.7) are the same, that is, <sup>E</sup>[E(H/I)] <sup>¼</sup> RI.

#### 2.3.2 The Maximized Selection Response

The main objective of the LPSI is to maximize the mean of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g (Eq. 2.7). Assuming that P, G, w, and kI are known, to maximize RI we can either maximize <sup>ρ</sup>HI or minimize the mean squared difference between <sup>I</sup> and <sup>H</sup>, <sup>E</sup>[(<sup>H</sup> - I) 2 ] ¼ w0 Gw + b<sup>0</sup> Pb - 2w<sup>0</sup> Gb with respect to b, that is, ∂ <sup>∂</sup><sup>b</sup> E Hð Þ - <sup>I</sup> <sup>2</sup> h i ¼ <sup>2</sup>Pb -<sup>2</sup>Gw ¼ <sup>0</sup>, from where

$$\mathbf{b} = \mathbf{P}^{-1} \mathbf{G} \mathbf{w} \tag{2.8}$$

is the vector that simultaneously minimizes <sup>E</sup>[(<sup>H</sup> - I) 2 ] and maximizes ρHI, and then RI <sup>¼</sup> kIσHρHI.

By Eq. (2.8), the maximized LPSI selection response can be written as

$$R\_I = k\_I \sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}}.\tag{2.9}$$

The maximized LPSI selection response predicts the mean improvement in H due to indirect selection on <sup>I</sup> only when <sup>b</sup> ¼ <sup>P</sup>-1 Gw (Harris 1964) and is proportional to the standard deviation of the LPSI variance (σI) and the standardized selection differential or the selection intensity (kI).

The maximized LPSI selection response (Eq. 2.9) it related to the Cauchy– Schwarz inequality (Rao 2002; Cerón-Rojas et al. 2006), which establishes that for any pair of vectors u and v, if A is a positive definite matrix, then the inequality (u0 v) 2 (v<sup>0</sup> Av)(u0 A-1 u) holds. Kempthorne and Nordskog (1959) proved that maximizing ρ<sup>2</sup> HI <sup>¼</sup> <sup>w</sup>ð Þ <sup>0</sup> Gb <sup>2</sup> ð Þ <sup>w</sup><sup>0</sup> Gw <sup>b</sup><sup>0</sup> ð Þ Pb also maximizes RI. According to Eqs. (2.6) and (2.7), R<sup>2</sup> <sup>I</sup> can be written as R<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> I <sup>w</sup>ð Þ <sup>0</sup> Gb <sup>2</sup> <sup>b</sup><sup>0</sup> ð Þ Pb , such that maximizing <sup>R</sup><sup>2</sup> <sup>I</sup> is equivalent to maximizing <sup>w</sup>ð Þ <sup>0</sup> Gb <sup>2</sup> <sup>b</sup><sup>0</sup> ð Þ Pb . Let Gw <sup>¼</sup> <sup>u</sup>, <sup>b</sup> <sup>¼</sup> <sup>v</sup>, and <sup>A</sup> <sup>¼</sup> <sup>P</sup>, by the Cauchy–Schwarz inequality ð Þ <sup>w</sup><sup>0</sup> Gb <sup>2</sup> <sup>b</sup><sup>0</sup> ð Þ Pb <sup>w</sup><sup>0</sup> GP-1 Gw. This implies that the maximum is reached when <sup>w</sup>ð Þ <sup>0</sup> Gb <sup>2</sup> <sup>b</sup><sup>0</sup> ð Þ Pb <sup>¼</sup> <sup>w</sup><sup>0</sup> GP-1 Gw, at which point RI <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 GP-1 Gw p . This latter result is the same as Eq. (2.9) when <sup>b</sup> ¼ <sup>P</sup>-1 Gw.

Result RI <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 GP-1 Gw p obtained using the Cauchy–Schwarz inequality corroborates that <sup>b</sup> ¼ <sup>P</sup>-1 Gw (Eq. 2.8) is a global minimum when the mean squared difference between <sup>I</sup> and <sup>H</sup> (E[(<sup>H</sup> - I) 2 ]) is minimized, and a global maximum when the correlation ρHI between I and H is maximized because RI <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 GP-1 Gw <sup>p</sup> only when <sup>b</sup> ¼ <sup>P</sup>-1 Gw.

#### 2.3.3 The LPSI Expected Genetic Gain Per Trait

Whereas <sup>R</sup> ¼ Cov gð Þ ; <sup>y</sup> σ2 y D (Eq. 2.4) denotes the selection response in the univariate case, <sup>E</sup> ¼ Cov Ið Þ ;<sup>g</sup> <sup>σ</sup><sup>I</sup> denotes the LPSI expected genetic gain per trait. Also, except by <sup>D</sup> σy , Cov gð Þ ;<sup>y</sup> <sup>σ</sup><sup>y</sup> and Cov Ið Þ ;<sup>g</sup> <sup>σ</sup><sup>I</sup> are mathematically equivalent and whereas Cov gð Þ ;<sup>y</sup> σy is the covariance between g and y weighted by the standard deviation of the variance of y, Cov Ið Þ ;<sup>g</sup> <sup>σ</sup><sup>I</sup> is the covariance between the breeding value vector and the LPSI values weighted by the standard deviation of the variance of LPSI. This means that in effect, E is the LPSI multi-trait selection response and can be written as

$$\mathbf{E} = k\_I \frac{\mathbf{G} \mathbf{b}}{\sigma\_I},\tag{2.10}$$

where G, σ<sup>I</sup> and kI were defined earlier. As Eq. (2.10) is the covariance between <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> <sup>p</sup> and <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> ... gt ½ divided by <sup>σ</sup>I, considering gj and <sup>I</sup> <sup>¼</sup> <sup>X</sup><sup>t</sup> j¼1 b jy <sup>j</sup>, the genetic gain in the jth index trait due to selection on I will be

$$\frac{k\_I}{\sigma\_I} \text{Cov}(I, \mathbf{g}\_j) = \frac{k\_I}{\sigma\_I} \left[ b\_1 \sigma\_{1j} + b\_2 \sigma\_{2j} + \dots + b\_j \sigma\_j^2 + \dots + b\_l \sigma\_{lj} \right] = k\_I \frac{\mathbf{b}^\prime \sigma\_j}{\sigma\_I}, \quad (2.11)$$

where σ<sup>0</sup> <sup>j</sup> <sup>¼</sup> <sup>σ</sup><sup>1</sup> <sup>j</sup> <sup>σ</sup><sup>2</sup> <sup>j</sup> <sup>σ</sup>tj h iis a vector of genotypic covariances of the <sup>j</sup>th index trait with all the index traits (Lin 1978; Brascamp 1984).

If Eq. (2.11) is multiplied by its economic weight, we obtain a measure of the economic value of each trait included in the net genetic merit (Cunningham and Tauebert 2009). In percentage terms, the economic value attributable to genetic change in the jth trait can be written as

$$
\rho\_{W\_j} \frac{\mathbf{b}^{\prime} \mathbf{o}\_j}{\sigma\_I^2} 100.\tag{2.12}
$$

In addition, the percentage reduction in the net genetic merit of overall genetic gain if the jth trait is omitted from the LPSI (Cunningham and Tauebert 2009) is

$$\left[1 - \sqrt{1 - \frac{b\_j^2}{\sigma\_I^2 \rho\_j^{-2}}}\right] 100,\tag{2.13}$$

where φ-2 <sup>j</sup> is the jth diagonal element of the inverse of the phenotypic covariance matrix P-<sup>1</sup> and b<sup>2</sup> <sup>j</sup> the square of the <sup>j</sup>th coefficient of the LPSI. Equations (2.12) and (2.13) are measures of the importance of each trait included in the LPSI when makes selection.

#### 2.3.4 Heritability of the LPSI

As the variance of <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y is equal to σ<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Pb ¼ <sup>b</sup><sup>0</sup> Gb þ <sup>b</sup><sup>0</sup> Rb, where <sup>P</sup> ¼ <sup>G</sup> <sup>+</sup> <sup>R</sup>, G and R are the phenotypic, genetic, and residual covariance matrices respectively, then the LPSI heritability (Lin and Allaire 1977; Nordskog 1978) can be written as

$$h\_I^2 = \frac{\mathbf{b}' \mathbf{G} \mathbf{b}}{\mathbf{b}' \mathbf{P} \mathbf{b}}.\tag{2.14}$$

When selecting a trait, the correlation between the phenotypic and genotypic values is equal to the square root of the trait's heritability (ρgy <sup>¼</sup> <sup>h</sup>); however, in the LPSI context, when <sup>b</sup> ¼ <sup>P</sup>-1 Gw, the maximized correlation between H and I is <sup>ρ</sup>HI <sup>¼</sup> ffiffiffiffiffiffiffiffi b0 Pb w0 Gw <sup>q</sup> ¼ σI σH , whereas hI <sup>¼</sup> ffiffiffiffiffiffiffi b0 Gb b0 Pb <sup>q</sup> is the square root of I heritability; that is,

Fig. 2.5 Estimated values of the square correlation between the LPSI and the net genetic merit (<sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g) and the LPSI heritability for four traits and 500 genotypes in one environment simulated for seven selection cycles

from a mathematical point of view, <sup>ρ</sup>HI 6¼ hI. In practice, <sup>h</sup><sup>2</sup> <sup>I</sup> and ρ<sup>2</sup> HI give similar results (Fig. 2.5).

#### 2.4 Statistical LPSI Properties

Assuming that <sup>H</sup> and <sup>I</sup> have joint bivariate normal distribution, <sup>b</sup> ¼ <sup>P</sup>-1 Gw, and P, G and w are known, the statistical LPSI properties (Henderson 1963) are the following:


$$\rho\_{HI} = \frac{\mathbf{w}' \mathbf{Gb}}{\sqrt{\mathbf{w}' \mathbf{Gw} \sqrt{\mathbf{b}' \mathbf{Pb}}}} = \frac{\mathbf{w}' \mathbf{GP}^{-1} \mathbf{Gw}}{\sqrt{\mathbf{w}' \mathbf{Gw}} \sqrt{\mathbf{w}' \mathbf{G} \mathbf{P}^{-1} \mathbf{Gw}}} = \sqrt{\frac{\mathbf{w}' \mathbf{GP}^{-1} \mathbf{Gw}}{\mathbf{w}' \mathbf{Gw}}} = \frac{\sigma\_l}{\sigma\_H}, \text{thus, } \rho\_{HI} = \frac{\sigma\_l}{\sigma\_H}.$$


#### 2.5 Particular Cases of the LPSI

#### 2.5.1 The Base LPSI

To derive the LPSI theory, we assumed that the phenotypic (P) and the genotypic (G) covariance matrix, and the vector of economic values (w) are known. However, P, G, and w are generally unknown and it is necessary to estimate them. There are many methods for estimating P and G (Lynch and Walsh 1998) and w (Cotterill and Jackson 1985; Magnussen 1990). However, when the estimator of P( Pb ) is not positive definite (all eigenvalues positive) or the estimator of <sup>G</sup>(G<sup>b</sup> ) is not positive semidefinite (no negative eigenvalues), the estimator of <sup>b</sup> ¼ <sup>P</sup>-1 Gw (b<sup>b</sup> ¼ <sup>P</sup>b-1 Gwb ) could be biased. In this case, the base linear phenotypic selection index (BLPSI):

$$I\_B = \mathbf{w'}\mathbf{y}\tag{2.15}$$

may be a better predictor of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> than the estimated LPSI <sup>b</sup><sup>I</sup> ¼ <sup>b</sup>b<sup>0</sup> y (Williams 1962a; Lin 1978) if the vector of economic values w is indeed known. Many authors (Williams 1962b; Harris 1964; Hayes and Hill 1980, 1981) have investigated the influence of parameter estimation errors on LPSI accuracy and concluded that those errors affect the accuracy of <sup>b</sup><sup>I</sup> ¼ <sup>b</sup>b<sup>0</sup> y when the accuracy of Pb and Gb is low. If vector w values are known, the BLPSI has certain advantages because of its simplicity and its freedom from parameter estimation errors (Lin 1978). Williams (1962a) pointed out that the BLPSI is superior tob<sup>I</sup> ¼ <sup>b</sup>b<sup>0</sup> y unless a large amount of data is available for estimating P and G.

There are some problems associated with the BLPSI. For example, what is the BLPSI selection response and the BLPSI expected genetic gains per trait when no data are available for estimating P and G? The BLPSI is a better selection index than the standard LPSI only if the correlation between the BLPSI and the net genetic merit is higher than that between the LPSI and the net genetic merit (Hazel 1943). However, if estimations of P and G are not available, how can the correlation between the base index and the net genetic merit be obtained? Williams (1962b) pointed out that the correlation between the BLPSI and <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g can be written as

$$
\rho\_{Hl\_8} = \sqrt{\frac{\mathbf{w}' \mathbf{G} \mathbf{w}}{\mathbf{w}' \mathbf{P} \mathbf{w}}} \tag{2.16}
$$

and indicated that the ratio <sup>ρ</sup>HIB <sup>=</sup>ρHI can be used to compare LPSI efficiency versus BLPSI efficiency; however, in the latter case, at least the estimates of <sup>P</sup> and <sup>G</sup>, i.e., <sup>P</sup><sup>b</sup> and Gb, need to be known.

In addition, Eq. (2.15) is only an assumption, not a result, and implies that P and <sup>G</sup> are the same. That is, <sup>b</sup> ¼ <sup>P</sup>-1 Gw ¼ <sup>w</sup> only when <sup>P</sup> ¼ <sup>G</sup>, which indicates that the BLPSI is a special case of the LPSI. Thus, to obtain the selection response and the expected genetic gains per trait of the BLPSI, we need some information about P and G. Assuming that the BLPSI is indeed a particular case of the LPSI, the BLPSI selection response and the BLPSI expected genetic gains per trait could be written as

$$R\_B = k\_I \sqrt{\mathbf{w}' \mathbf{P} \mathbf{w}},\tag{2.17}$$

and

$$\mathbf{E}\_B = k\_I \frac{\mathbf{G} \mathbf{w}}{\sqrt{\mathbf{w}' \mathbf{P} \mathbf{w}}},\tag{2.18}$$

respectively. The parameters of Eqs. (2.17) and (2.18) were defined earlier.

There are additional implications if <sup>b</sup> ¼ <sup>P</sup>-1 Gw ¼ <sup>w</sup>. For example, if <sup>P</sup> ¼ <sup>G</sup>, then <sup>ρ</sup>HIB <sup>¼</sup> ffiffiffiffiffiffiffiffi w0 Gw w0 Pw <sup>q</sup> and BLPSI heritability h<sup>2</sup> IB <sup>¼</sup> <sup>w</sup><sup>0</sup> Gw w0 Pw are equal to 1. However, in practice, the estimated values of the <sup>ρ</sup>HIB (bρHIB ) are usually lower than the estimated values of the <sup>ρ</sup>HI(bρHI) (Fig. 2.6).

#### 2.5.2 The LPSI for Independent Traits

Suppose that the traits under selection are independent, then P and G are diagonal matrices and <sup>b</sup> ¼ <sup>P</sup>-1 Gw is a vector of single-trait heritabilities multiplied by the economic weights, because P-1 G is the matrix of multi-trait heritabilities (Xu and Muir 1992). Based on this result, Hazel and Lush (1942) and Smith et al. (1981) used trait heritabilities multiplied by the economic weights (or heritabilities only) as coefficients of the LPSI. Thus, when the traits are independent and the economic weights are known, the LPSI can be constructed as

Fig. 2.6 Values of the true correlation between the LPSI and the net genetic merit (<sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g) (True-C), the estimated correlation between the LPSI and H (LPSI-C), and the estimated correlation between the base index and H (Base-C) for four traits and 500 genotypes in one environment simulated for seven selection cycles

$$I = \sum\_{i=1}^{t} w\_i h\_i^2 \mathbf{y}\_i,\tag{2.19}$$

and when the economic weights are unknown, the LPSI can be constructed as

$$I = \sum\_{i=1}^{t} h\_i^2 \mathbf{y}\_i. \tag{2.20}$$

The selection response of Eq. (2.19) and (2.20) can be seen in Hazel and Lush (1942).

#### 2.6 Criteria for Comparing LPSI Efficiency

Assuming that the intensity of selection is the same in both indices, we can compare BLPSI (IB <sup>¼</sup> <sup>w</sup><sup>0</sup> y) efficiency versus LPSI efficiency to predict the net genetic merit in percentage terms as

$$p = 100(\lambda - 1),\tag{2.21}$$

where <sup>λ</sup> ¼ <sup>ρ</sup>HI ρHIB (Williams 1962b; Bulmer 1980). Therefore, when <sup>p</sup> ¼ 0, the efficiency of both indices is the same; when p > 0, the efficiency of the LPSI is higher than the base index efficiency, and when p < 0, the base index efficiency is higher than LPSI efficiency (Fig. 2.6). Equation (2.21) is useful for comparing the efficiency of any linear selection index, as we shall see in this book.

#### 2.7 Estimating Matrices G and P

To derive the LPSI theory we assumed that matrices P and G are known. In practice, we have to estimate them. Matrices P and G can be estimated by analysis of variance (ANOVA), maximum likelihood or restricted maximum likelihood (REML) (Baker 1986; Lynch and Walsh 1998; Searle et al. 2006; Hallauer et al. 2010). Equation (2.1) is the simplest model because we only need to estimate two variance components: the genotypic variance (σ<sup>2</sup> <sup>g</sup> ) and the residual variance (σ<sup>2</sup> <sup>e</sup> ), from where the phenotypic variance for trait y is the sum of σ<sup>2</sup> <sup>g</sup> and σ<sup>2</sup> <sup>e</sup> , that is, σ<sup>2</sup> <sup>y</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>g</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup> e . However, to construct matrices P and G, we also need the covariance between any two traits. Thus, if yi and yj (i, <sup>j</sup> <sup>¼</sup> 1, 2, , <sup>t</sup>) are any two traits, then the covariance between yi and yj (σyij) can be written as <sup>σ</sup>yij <sup>¼</sup> <sup>σ</sup>gij <sup>þ</sup> <sup>σ</sup>eij , where <sup>σ</sup>gij and <sup>σ</sup>eij denote the genotypic and residual covariance respectively of traits yi and yj.

Several authors (Baker 1986; Lynch and Walsh 1998; Hallauer et al. 2010) have described ANOVA methods for estimating matrix G using specific design data, for example, half-sib, full-sib, etc., when the sample sizes are well balanced. In the ANOVA method, observed mean squares are equal to their expected values; the expected values are linear functions of the unknown variance components; thus the resulting equations are a set of simultaneous linear equations in the variance components. The expected values of mean squares in the ANOVA method do not need assumptions of normality because the variance component estimators do not depend on normality assumptions (Lynch and Walsh 1998; Hallauer et al. 2010).

In cases where the sample sizes are not well balanced, Lynch and Walsh (1998) and Fry (2004) proposed using the REML method to estimate matrix G. The REML estimation method does not require a specific design or balanced data and can be used to estimate genetic and residual variance and covariance in any arbitrary pedigree of individuals. The REML method is based on projecting the data in a subspace free of fixed effects and maximizing the likelihood function in this subspace, and has the advantage of producing the same results as the ANOVA in balanced designs (Blasco 2001).

In the context of the linear mixed model, Lynch and Walsh (1998) have given formulas for estimating variances σ<sup>2</sup> <sup>g</sup> and σ<sup>2</sup> <sup>e</sup> that can be adapted to estimate covariances σgij and σeij . Suppose that we want to estimate σ<sup>2</sup> <sup>g</sup> and σ<sup>2</sup> <sup>e</sup> for the qth trait (<sup>q</sup> ¼ 1, 2, <sup>t</sup> ¼ number of traits) in the absence of dominance and epistatic effects using the model <sup>y</sup><sup>q</sup> <sup>¼</sup> <sup>1</sup>μ<sup>q</sup> <sup>+</sup> Zg<sup>q</sup> <sup>+</sup> <sup>e</sup>q, where the vector of averages <sup>y</sup>q~NMV (1μq,Vq) is <sup>g</sup> 1 (<sup>g</sup> ¼ number of genotypes in the population) and has multivariate normal distribution; <sup>1</sup> is a <sup>g</sup> 1 vector of ones, <sup>μ</sup><sup>q</sup> is the mean of the <sup>q</sup>th trait, <sup>Z</sup> is an identity matrix <sup>g</sup> <sup>g</sup>, <sup>g</sup>q~NMV(0, <sup>A</sup>σ<sup>2</sup> <sup>g</sup><sup>q</sup> ) is a vector of true breeding values, and eq~NMV(0,Iσ<sup>2</sup> eq ) is a <sup>g</sup> 1 vector of residuals, where NMV stands for normal multivariate distribution. Matrix A denotes the numerical relationship matrix between individuals (Lynch and Walsh 1998; Mrode 2005) and <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>A</sup>σ<sup>2</sup> <sup>g</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> eq .

The expectation–maximization algorithm allows the REML to be computed for the variance components σ<sup>2</sup> <sup>g</sup><sup>q</sup> and <sup>σ</sup><sup>2</sup> eq by iterating the following equations:

$$\sigma\_{\mathfrak{g}\_q}^{2(n+1)} = \sigma\_{\mathfrak{g}\_q}^{2(n)} + \frac{\left(\sigma\_{\mathfrak{g}\_q}^{2(n)}\right)^2}{g} \left[ \mathbf{y}\_q' \left( \mathbf{T}^{(n)} \mathbf{A} \mathbf{T}^{(n)} \right) \mathbf{y}\_q - tr \left( \mathbf{T}^{(n)} \mathbf{A} \right) \right] \tag{2.22}$$

and

$$
\sigma\_{e\_q}^{2(n+1)} = \sigma\_{e\_q}^{2(n)} + \frac{\left(\sigma\_{e\_q}^{2(n)}\right)^2}{g} \left[ \mathbf{y}\_q' \left( \mathbf{T}^{(n)} \mathbf{T}^{(n)} \right) \mathbf{y}\_q - tr \left( \mathbf{T}^{(n)} \right) \right], \tag{2.23}
$$

where, after n iterations, σ2ð Þ <sup>n</sup>þ<sup>1</sup> <sup>g</sup><sup>q</sup> and <sup>σ</sup>2ð Þ <sup>n</sup>þ<sup>1</sup> eq are the estimated variance components of σ2 gq and <sup>σ</sup><sup>2</sup> eq respectively; tr(.) denotes the trace of the matrices within brackets; T ¼ <sup>V</sup>-1 <sup>q</sup> - V-1 <sup>q</sup> 1 1<sup>0</sup> V-1 <sup>q</sup> 1 10 V-1 <sup>q</sup> and V-1 <sup>q</sup> is the inverse of matrix <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>A</sup>σ<sup>2</sup> <sup>g</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> eq . In T(n) , V-<sup>1</sup>ð Þ<sup>n</sup> <sup>q</sup> is the inverse of matrix Vð Þ<sup>n</sup> <sup>q</sup> <sup>¼</sup> <sup>A</sup>σ2ð Þ<sup>n</sup> <sup>γ</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ2ð Þ<sup>n</sup> eq .

The additive genetic and residual covariances between the observations of the qth and ith traits, y<sup>q</sup> and y<sup>i</sup> (σgq,<sup>i</sup> and σeq,<sup>i</sup> , <sup>q</sup>, <sup>i</sup> ¼ 1, 2, ..., <sup>t</sup>), can be estimated using REML by adapting Eqs. (2.22) and (2.23). Note that the variance of the sum of y<sup>q</sup> and <sup>y</sup><sup>i</sup> can be written as Var(y<sup>i</sup> <sup>+</sup> <sup>y</sup>q) <sup>¼</sup> <sup>V</sup><sup>i</sup> <sup>+</sup> <sup>V</sup><sup>q</sup> + 2Ciq, where <sup>V</sup><sup>i</sup> <sup>¼</sup> <sup>A</sup>σ<sup>2</sup> <sup>g</sup><sup>i</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> ei is the variance of <sup>y</sup><sup>i</sup> and <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>A</sup>σ<sup>2</sup> <sup>g</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> eq is the variance of yq; in addition, <sup>2</sup>Ciq <sup>¼</sup> <sup>2</sup>Aσgiq + 2Iσeiq <sup>¼</sup> <sup>2</sup>Cov(yi, <sup>y</sup>q) is the covariance of <sup>y</sup><sup>q</sup> and <sup>y</sup>i, and <sup>σ</sup>giq and σeiq are the additive and residual covariances respectively associated with the covariance of y<sup>q</sup> and yi. Thus, one way of estimating σgiq and σeiq is by using the following equation:

$$0.5Var(\mathbf{y}\_i + \mathbf{y}\_q) - 0.5Var(\mathbf{y}\_i) - 0.5Var(\mathbf{y}\_q),\tag{2.24}$$

for which Eqs. (2.22) and (2.23) can be used. Equations (2.22) to (2.24) are used to estimate P and G in the illustrative examples of this book.

#### 2.8 Numerical Examples

#### 2.8.1 Simulated Data

This data set was simulated by Ceron-Rojas et al. (2015) and can be obtained at http://hdl.handle.net/11529/10199. The data were simulated for eight phenotypic selection cycles (C0 to C7), each with four traits (T1, T2, T<sup>3</sup> and T4), 500 genotypes, and four replicates for each genotype (Fig. 2.7). The LPSI economic weights for T1,

Fig. 2.7 Schematic illustration of the steps followed to generate data sets 1 and 2 for the seven selection cycles using the linear phenotypic selection index and the linear genomic selection index. Dotted lines indicate the process used to simulate the phenotypic data (according to Ceron-Rojas et al. 2015)

<sup>T</sup>2, <sup>T</sup><sup>3</sup> and <sup>T</sup><sup>4</sup> were 1, -1, 1, and 1 respectively. Each of the four traits was affected by a different number of quantitative trait loci (QTLs): 300, 100, 60, and 40, respectively. The common QTLs affecting the traits generated genotypic correlations of -0.5, 0.4, 0.3, -0.3, -0.2, and 0.1 between T<sup>1</sup> and T2, T<sup>1</sup> and T3, T<sup>1</sup> and T4, T<sup>2</sup> and T3, T<sup>2</sup> and T4, and T<sup>3</sup> and T<sup>4</sup> respectively. The genotypic value of each plant was generated based on its haplotypes and the QTL effects for each trait.

Simulated data were generated using QU-GENE software (Podlich and Cooper 1998; Wang et al. 2003). A total of 2500 molecular markers were distributed uniformly across 10 chromosomes, whereas 315 QTLs were randomly allocated over the ten chromosomes to simulate one maize (Zea mays L.) population. Each QTL and molecular marker was biallelic and the QTL additive values ranged from 0 to 0.5. As QU-GENE uses recombination fraction rather than map distance to calculate the probability of crossover events, recombination between adjacent pairs of markers was set at 0.0906; for two flanking markers, the QTL was either on the first (recombination between the first marker and QTL was equal to 0.0) or the second (recombination between the first marker and QTL was equal to 0.0906) marker; excluding the recombination fraction between 15 random QTLs and their flanking markers, which was set at 0.5, i.e., complete independence (Haldane 1919), to simulate linkage equilibrium between 5% of the QTLs and their flanking markers. In addition, in every case, two adjacent QTLs were in complete linkage. For each trait, the phenotypic value for each of four replications of each plant was obtained from QU-GENE by setting the per-plot heritability of T1, T2, T3, and T<sup>4</sup> at 0.4, 0.6, 0.6, and 0.8 respectively.

#### 2.8.2 Estimated Matrices, LPSI, and Its Parameters

For this example, we used only cycle C1 data and traits T1, T2, and T3. The phenotypic and genotypic estimated covariance matrices for traits T1, T2, and T<sup>3</sup> were <sup>P</sup><sup>b</sup> ¼ <sup>62</sup>:<sup>50</sup> -12:74 8:53 -<sup>12</sup>:74 17:<sup>52</sup> -3:38 <sup>8</sup>:<sup>53</sup> -3:38 12:31 2 4 3 <sup>5</sup> and <sup>G</sup><sup>b</sup> ¼ <sup>36</sup>:<sup>21</sup> -12:93 8:35 -<sup>12</sup>:93 13:<sup>04</sup> -3:40 <sup>8</sup>:<sup>35</sup> -3:40 9:96 2 4 3 5 respectively, whereas the inverse of matrix Pb was Pb-1 ¼ <sup>0</sup>:01997 0:<sup>01251</sup> -0:01040 0:01251 0:06809 0:01005 -0:01040 0:01005 0:09123 2 4 3 5. The estimated heritabilities for T1,

<sup>T</sup>2, and <sup>T</sup><sup>3</sup> were <sup>b</sup>h<sup>2</sup> <sup>¼</sup> <sup>0</sup>:579, <sup>b</sup>h<sup>2</sup> <sup>¼</sup> <sup>0</sup>:744, and <sup>b</sup>h<sup>2</sup> <sup>¼</sup> <sup>0</sup>:809 respectively.

According to matrices Pb-<sup>1</sup> and <sup>G</sup>b, and because <sup>w</sup><sup>0</sup> ¼ ½ <sup>1</sup> -1 1 , the estimated vector of coefficients was <sup>b</sup>b<sup>0</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> GPc -1 ¼ ½ <sup>0</sup>:<sup>555</sup> -1:063 1:087 , from which the estimated LPSI can be written as <sup>b</sup><sup>I</sup> <sup>¼</sup> <sup>0</sup>:555T<sup>1</sup> - <sup>1</sup>:063T<sup>2</sup> <sup>þ</sup> <sup>1</sup>:087T3. Table 2.2 presents the first 20 genotypes, the means of the three traits (T1, T2 and T3) and the first 20 estimated unranked LPSI values of the 500 simulated genotypes for cycle C1.


Table 2.2 Number of genotypes, means of the trait (T1, T2 and T3) values, and unranked values of the LPSI for part of a simulated data set

According to the means of the three traits, the first estimated LPSI value was obtained as

$$\dot{I}\_1 = 0.555(164.46) - 1.063(39.63) + 1.087(34.66) = 86.81;$$

the second estimated LPSI value was obtained as

<sup>b</sup><sup>I</sup> <sup>2</sup> <sup>¼</sup> <sup>0</sup>:555 144 ð Þ-:<sup>39</sup> <sup>1</sup>:063 144 ð Þþ :<sup>39</sup> <sup>1</sup>:087 34 ð Þ¼ :<sup>65</sup> <sup>63</sup>:82, etc:;

and the 20th estimated LPSI value was obtained as

$$\dot{I}\_{20} = 0.555(161.80) - 1.063(46.58) + 1.087(37.33) = 80.84\dots$$

This estimation procedure is valid for any number of genotypes. Table 2.3 presents the 20 genotypes ranked by the estimated LPSI values. Note that if we use 20% selection intensity for Table 2.2 data, we should select genotypes 12, 18, 1, 6, and 10, because their estimated LPSI values are higher than the remaining LPSI values for that set of genotypes. Using the idea described in Fig. 2.4, genotypes 12, 18, 1, 6, and 10 should be in the red zone, whereas the rest of the genotypes are in the white zone and should be culled. Here, the proportion selected is <sup>q</sup> ¼ 0.2 and


Table 2.3 Number of genotypes, means of the trait (T1, T2 and T3) values and ranked values of the LPSI for part of a simulated data set

<sup>z</sup> ¼ exp -<sup>0</sup>:5u0<sup>2</sup> f g ffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:31, where <sup>u</sup><sup>0</sup> <sup>¼</sup> <sup>81</sup>:35-75:64 <sup>8</sup>:<sup>11</sup> <sup>¼</sup> <sup>0</sup>:704, 81.35 is the estimated LPSI value or the genotype number 10, 75.64 is the mean of the 20 LPSI values, and 8.11 is the standard deviation of the estimated LPSI values of the 20 genotypes presented in Tables 2.2 and 2.3.

Table 2.4 presents 25 genotypes and the means of the three traits obtained from the 500 simulated genotypes for cycle C1 and ranked by the estimated LPSI values. In this case, we used 5% selection intensity (kI <sup>¼</sup> 2.063). Also, the last four rows in Table 2.4 give:


The variance of the estimated selection index for the 500 genotypes was Vb bI ¼ <sup>b</sup>b<sup>0</sup> <sup>P</sup>bb<sup>b</sup> ¼ <sup>86</sup>:72, from which the standard deviation of <sup>b</sup><sup>I</sup> was 9.312. The


Table 2.4 Number of selected genotypes, selected means of the trait (T1, T2 and T3) values and ranked selected values of the LPSI from one simulated set of 500 genotypes with four repetitions

The selection intensity was 5%

estimated standardized selection differentials for the LPSI can be obtained from Table A in Falconer and Mackay (1996), where, for 5% selection intensity, kI <sup>¼</sup> 2.063. This means that the estimated LPSI selection response was <sup>R</sup><sup>b</sup> ¼ <sup>2</sup>:063 9ð Þ¼ :<sup>312</sup> <sup>19</sup>:21, whereas the expected genetic gain per trait, or multitrait selection response, was b <sup>E</sup><sup>0</sup> ¼ <sup>2</sup>:<sup>063</sup> <sup>b</sup>b<sup>0</sup> Gb <sup>9</sup>:<sup>312</sup> " # <sup>¼</sup> ½ <sup>9</sup>:<sup>51</sup> -5:48 4:22 :

#### 2.8.3 LPSI Efficiency Versus Base Index Efficiency

The estimated correlation between the LPSI and the net genetic merit was <sup>b</sup>ρHI <sup>¼</sup> <sup>σ</sup>b<sup>I</sup> σbH ¼ <sup>0</sup>:894, whereas the estimated correlation between the base index and the net genetic merit was <sup>b</sup>ρHIB <sup>¼</sup> <sup>0</sup>:875, thus <sup>b</sup><sup>λ</sup> <sup>¼</sup> <sup>b</sup>ρHI ¼ <sup>1</sup>:0217 and, by Eq. (2.21),

<sup>b</sup>ρHIB <sup>b</sup><sup>p</sup> ¼ <sup>100</sup> bλ - 1 ¼ <sup>2</sup>:171. This means that LPSI efficiency was only 2.2% higher than the base index efficiency for this data set.

Using the same data set described in Sect. 2.8.1 of this chapter, we conducted seven selection cycles (C1 to C7) for the four traits (T1, T2, T3, and T4) using the LPSI and the BLPSI. These results are presented in Table 2.5. To compare the LPSI efficiency versus BLPSI efficiency, we obtained the true selection response of the simulated data (second column in Table 2.5) and we estimated the LPSI and BLPSI selection response for each selection cycle (third column in Table 2.5); in addition, we estimated the LPSI and BLPSI expected genetic gain per trait for each selection cycle (columns 4 to 7 in Table 2.5). The first part of Table 2.5 shows the true selection response and the estimated values of the LPSI selection response and expected genetic gain per trait. In a similar manner, the second part of Table 2.5 shows the true selection response, the estimated values of the BLPSI selection


Table 2.5 The LPSI and BLPSI responses (true and estimated) and estimated expected genetic gain per trait for seven simulated selection cycles

The selection intensity was 10% (kI <sup>¼</sup> 1.755)

response, and the expected genetic gain per trait. The average value of the true selection response was equal to 14.43, whereas the average values of the estimated LPSI and BLPSI selection response were 14.19 and 19.34 respectively. Note that 14.43–14.19 ¼ 0.24, but 19.34–14.43 ¼ 4.91. According to this result, the BLPSI over-estimated the true selection response of the simulated data by 34.7%. Thus, based on the Table 2.5 results and those presented in Fig. 2.6, we can conclude that the LPSI was more efficient than the BLPSI for this data set.

Finally, additional results can be seen in Chap. 10, where the LPSI was simulated for many selection cycles. Chapter 11 describes RIndSel: a program that uses R and the selection index theory to make selection.

#### 2.9 The LPSI and Its Relationship with the Quadratic Phenotypic Selection Index

In the nonlinear selection index theory, the net genetic merit and the index are both nonlinear. There are many types of nonlinear indices; Goddard (1983) and Weller et al. (1996) have reviewed the general theory of nonlinear selection indices. In this chapter, we describe only the simplest of them: the quadratic index developed mainly by Wilton et al. (1968), Wilton (1968), and Wilton and Van Vleck (1969), which is related to the LPSI.

#### 2.9.1 The Quadratic Nonlinear Net Genetic Merit

The most common form of writing the quadratic net genetic merit is

$$H\_q = a + \mathbf{w}'(\mathfrak{\mu} + \mathbf{g}) + (\mathfrak{\mu} + \mathbf{g})' \mathbf{A}(\mathfrak{\mu} + \mathbf{g}),\tag{2.25}$$

where α is a constant, g is the vector of breeding values, which has normal distribution with zero mean and covariance matrix G, μ is the vector of population means, and w is a vector of economic weights. In addition, matrix A can be written

as <sup>A</sup> ¼ <sup>w</sup><sup>1</sup> <sup>0</sup>:5w<sup>12</sup> <sup>0</sup>:5w1<sup>t</sup> <sup>0</sup>:5w<sup>12</sup> <sup>w</sup><sup>2</sup> <sup>0</sup>:5w2<sup>t</sup> ⋮ ⋮⋱⋮ 0:5w1<sup>t</sup> 0:5w2<sup>t</sup> ... wt 2 6 6 4 3 7 7 5 , where the diagonal ith values wi (i = 1,2,

..., t ) is the relative economic weight of the genetic value of the squared trait i and wij (i,j = 1,2, ..., t ) is the economic weight of the cross products between the genetic values of traits i and j. The main difference between the linear net genetic merit (Eq. 2.2) and the net quadratic merit (Eq. 2.25) is that the latter depends on μ and (μ + g) 0 A(μ + g).

#### 2.9.2 The Quadratic Index

The quadratic phenotypic selection index is

$$I\_q = \beta + \mathbf{b}'\mathbf{y} + \mathbf{y}'\mathbf{B}\mathbf{y} \tag{2.26}$$

where β is a constant, y is the vector of phenotypic values that has multivariate normal distribution with zero mean and covariance matrix <sup>P</sup>, <sup>b</sup><sup>0</sup> <sup>¼</sup> <sup>b</sup><sup>1</sup> <sup>b</sup><sup>2</sup> bt ½ is a

vector of coefficients, and <sup>B</sup> ¼ <sup>b</sup><sup>1</sup> <sup>0</sup>:5b<sup>12</sup> <sup>0</sup>:5b1<sup>t</sup> <sup>0</sup>:5b<sup>12</sup> <sup>b</sup><sup>2</sup> <sup>0</sup>:5b2<sup>t</sup> ⋮ ⋮⋱⋮ 0:5b1<sup>t</sup> 0:5b2<sup>t</sup> ... bt 2 6 6 4 3 7 7 5 . In matrix B, the

diagonal ith values bi (i = 1,2, ..., t ) is the index weight for the square of the phenotypic i and bij (i,j = 1,2, ..., t ) is the index weight for the cross products between the phenotype of the traits i and j.

#### 2.9.3 The Vector and the Matrix of Coefficients of the Quadratic Index

As we saw in Sect. 2.3.2 of this chapter, to obtain the vector (b) and the matrix (B) of coefficients of the quadratic index that maximized the selection response, we can minimize the expectation of the square difference between the quadratic index (Iq) and the quadratic net genetic merit (Hq): <sup>Φ</sup> <sup>=</sup> <sup>E</sup>{[Iq - <sup>E</sup>(Iq)] - [Hq - E(Hq)]}<sup>2</sup> , or we can maximize the correlation between Iq and Hq, i.e., <sup>ρ</sup>HqIq <sup>¼</sup> Cov Hð Þ <sup>q</sup>;Iq ffiffiffiffiffiffiffiffiffiffiffiffi Var Ið Þ<sup>q</sup> p ffiffiffiffiffiffiffiffiffiffiffiffiffi Var Hð Þ<sup>q</sup> <sup>p</sup> , where Cov(Hq, Iq) is the covariance between Iq and Hq, ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var Iq <sup>q</sup> is the standard deviation of the variance of Iq, and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var Hq <sup>q</sup> is the standard deviation of the variance of Hq. In this context, it is easier to maximize ρHqIq than to minimize Φ. Vandepitte (1972) minimized Φ, but in this section we shall maximize ρHqIq .

Suppose that μ = 0, since α and β are constants that do not affect ρHqIq , we can write Iq and Hq as Iq = b<sup>0</sup> y + y<sup>0</sup> By and Hq = w<sup>0</sup> g + g<sup>0</sup> Ag. Thus, under the assumption that y and g have multivariate normal distribution with mean 0 and covariance matrix P and G, respectively, E(Iq) = tr(BP) and E(Hq) = tr(AG) are the expectations of Iq and Hq, whereas Var(Iq) = b<sup>0</sup> Pb + 2tr[(BP) 2 ] and Var(Hq) = w<sup>0</sup> Gw + 2tr[(AG) 2 ] are the variances of Iq and Hq, respectively. The covariance between Iq and Hq is Cov (Hq, Iq) = w<sup>0</sup> Gb + 2tr(BGAG) (Vandepitte 1972), where tr(∘) denotes the trace function of matrices.

According to the foregoing results, we can maximize the natural logarithm of ρHqIq [ln ρHqIq ] with respect to vector <sup>b</sup> and matrix <sup>B</sup> assuming that <sup>w</sup>,A,P, and <sup>G</sup> are known. Hence, except for two proportional constants that do not affect the maximum value of ρHqIq because this is invariant to the scale change, the results of the derivatives of ln ρHqIq with respect to <sup>b</sup> and <sup>B</sup> are

$$\mathbf{b} = \mathbf{P}^{-1} \mathbf{G} \mathbf{w} \text{ and } \mathbf{B} = \mathbf{P}^{-1} \mathbf{G} \mathbf{A} \mathbf{G} \mathbf{P}^{-1},\tag{2.27}$$

respectively. In this case, b = P-1 Gw is the same as the LPSI vector of coefficients (see Eq. 2.8 for details); however, when <sup>μ</sup> ¼6 <sup>0</sup>, <sup>b</sup> <sup>=</sup> <sup>P</sup>-1 G(w + 2Aμ) = P-1 Gw + 2P-1 GAμ. In the latter case, b has the additional term 2P-1 GAμ, which is null when μ = 0 or <sup>A</sup> <sup>=</sup> <sup>0</sup>. Hence, when <sup>μ</sup> 6¼ <sup>0</sup> the quadratic index vector <sup>b</sup> shall have two components: P-1 Gw, which is the LPSI vector of coefficients, and 2P-1 GAμ, which is a function of the current population mean μ multiplied by matrix A. Therefore, when <sup>μ</sup> 6¼ <sup>0</sup> and <sup>A</sup> 6¼ <sup>0</sup>, the quadratic index vector <sup>b</sup> will change when the μ values change. However, μ does not affect matrix B.

#### 2.9.4 The Accuracy and Maximized Selection Response of the Quadratic Index

According to Eq. (2.27) results, Var(Iq) = Cov(Hq, Iq) = b<sup>0</sup> Pb + 2tr[(BP) 2 ], which means that the quadratic index accuracy and the maximized selection response can be written as:

$$\rho\_{H\_{q}I\_{q}} = \frac{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{P}^{-1} \mathbf{G} \mathbf{w} + 2tr\left[\left(\mathbf{P}^{-1} \mathbf{G} \mathbf{A} \mathbf{G}\right)^{2}\right]}}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w} + 2tr\left[\left(\mathbf{A} \mathbf{G}\right)^{2}\right]}}\tag{2.28}$$

and

$$R\_q = k \sqrt{\mathbf{w'} \mathbf{G} \mathbf{P}^{-1} \mathbf{G} \mathbf{w} + 2tr \left[ \left( \mathbf{P}^{-1} \mathbf{G} \mathbf{A} \mathbf{G} \right)^2 \right]},\tag{2.29}$$

respectively, where k is the selection intensity of the quadratic index. Equations (2.27) to (2.29) indicate that the LPSI and the quadratic index are related, and the only difference between them is the quadratic terms. Wilton et al. (1968) wrote Eq. (2.29) as: Rq <sup>¼</sup> <sup>k</sup> ffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>þ</sup> <sup>k</sup>2tr ð Þ BP <sup>2</sup> h i.

#### References

Akbar MK, Lin CY, Gyles NR, Gavora JS, Brown CJ (1984) Some aspects of selection indices with constraints. Poult Sci 63:1899–1905


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 3 Constrained Linear Phenotypic Selection Indices

Abstract The linear phenotypic selection index (LPSI), the null restricted LPSI (RLPSI), and the predetermined proportional gains LPSI (PPG-LPSI) are the main phenotypic selection indices used to predict the net genetic merit and select parents for the next selection cycle. The LPSI is an unrestricted index, whereas the RLPSI and the PPG-LPSI allow restrictions equal to zero and predetermined proportional gain restrictions respectively to be imposed on the expected genetic gain values of the trait to make some traits change their mean values based on a predetermined level while the rest of the trait means remain without restrictions. One additional restricted index is the desired gains LPSI (DG-LPSI), which does not require economic weights and, in a similar manner to the PPG-LPSI, allows restrictions to be imposed on the expected genetic gain values of the trait to make some traits change their mean values based on a predetermined level. The aims of RLPSI and PPG-LPSI are to maximize the selection response, the expected genetic gains per trait, and provide the breeder with an objective rule for evaluating and selecting parents for the next selection cycle based on several traits. This chapter describes the theory and practice of the RLPSI, PPG-LPSI, and DG-LPSI. We show that the PPG-LPSI is the most general index and includes the LPSI and the RLPSI as particular cases. Finally, we describe the DG-LPSI as a modification of the PPG-LPSI. We illustrate the theoretical results of all the indices using real and simulated data.

#### 3.1 The Null Restricted Linear Phenotypic Selection Index

Conditions to construct a valid null restricted linear phenotypic selection index (RLPSI) are the same as those described in Sect. 2.1 of Chap. 2. The main objective of the RLPSI is to optimize, under some null restrictions, the selection response, to predict the net genetic merit H ¼ w<sup>0</sup> g and select the individuals with the highest net genetic merit values as parents of the next generation. The RLPSI allows restrictions equal to zero to be imposed on the expected genetic gains of some traits, whereas other traits increase (or decrease) their expected genetic gains without imposing any restrictions. The RLPSI solves the LPSI equations subject to the condition that the covariance between the index and some linear functions of the genotypes involved be zero, thus preventing selection on the RLPSI from causing any genetic change in some expected genetic gains of the traits (Cunningham et al. 1970).

Vector b ¼ P-1 Gw maximizes the LPSI selection response, expected genetic gains per trait, and the correlation between the LPSI and <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g. In this section, we show that the vector of the RLPSI coefficients, <sup>b</sup><sup>R</sup> <sup>¼</sup> Kb:


Vector <sup>b</sup><sup>R</sup> <sup>¼</sup> Kb is a linear transformation of the LPSI vector of coefficients (b) made by the projector matrix <sup>K</sup>. Matrix <sup>K</sup> is idempotent (<sup>K</sup> <sup>¼</sup> <sup>K</sup><sup>2</sup> ) and projects b into a space smaller than the original space of b because the restrictions imposed on the expected genetic gains per trait are equal to zero. The reduction of the space into which matrix K projects b is equal to the number of null restrictions imposed by the breeder on the expected genetic gain per trait, or multi-trait selection response (Cerón-Rojas et al. 2016).

The covariance between the breeding value vector (g) and the LPSI (I ¼ b 0 y) is Cov(I, g) ¼ Gb. Suppose that the breeder is interested in improving only (t r) of t (r < t) traits, leaving r of them fixed, that is, r expected genetic gains of the trait are equal to zero for a specific selection cycle. Thus, we want r covariances between the linear combinations of g (U<sup>0</sup> g) and the I ¼ b 0 y to be zero, i.e., Cov(I, U<sup>0</sup> <sup>g</sup>) <sup>¼</sup> <sup>U</sup><sup>0</sup> Gb <sup>¼</sup> <sup>0</sup>, where <sup>U</sup><sup>0</sup> is a matrix with r 1's and (t r) 0's; 1 indicates that the trait is restricted and 0 that the trait is not restricted. That is, in the linear combinations of g (U<sup>0</sup> g), 1 is the coefficient of the genotypes that have covariance equal to zero with the LPSI, whereas the genotypes with coefficient 0 have no restriction on the expected genetic gains. We can solve this problem by maximizing the correlation between I and H (ρHI) or minimizing the mean squared difference between I and H(E [(H - I) 2 ]) under the restriction U<sup>0</sup> Gb ¼ 0.

#### 3.1.1 The Maximized RLPSI Parameters

In the LPSI context, vector b ¼ P-1 Gw minimizes the mean squared difference between I and H, E[(H - I) 2 ] <sup>¼</sup> <sup>w</sup><sup>0</sup> Gw + b 0 Pb - 2w<sup>0</sup> Gb. Let C<sup>0</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> G and C<sup>0</sup> b ¼ 0; we need to minimize E[(H - I) 2 ] with respect to b under the restriction C<sup>0</sup> b ¼ 0. Thus, assuming that P, G, U<sup>0</sup> and w are known, we need to minimize the function

$$\Psi(\mathbf{b}, \mathbf{v}) = \mathbf{b}'\mathbf{P}\mathbf{b} + \mathbf{w}'\mathbf{G}\mathbf{w} - 2\mathbf{w}'\mathbf{G}\mathbf{b} + 2\mathbf{v}'\mathbf{C}'\mathbf{b} \tag{3.1}$$

with respect to vectors b and v 0 ¼ [v<sup>1</sup> v<sup>2</sup> vr - 1], where v is a vector of Lagrange multipliers. The derivative results from b and v 0 are

$$\mathbf{P}\mathbf{b} + \mathbf{C}\mathbf{v} = \mathbf{G}\mathbf{w}$$

and

$$\mathbf{C}'\mathbf{b} = \mathbf{0},$$

or, in matrix notation,

$$
\begin{bmatrix} \mathbf{P} & \mathbf{C} \\ \mathbf{C}' & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{b} \\ \mathbf{v} \end{bmatrix} = \begin{bmatrix} \mathbf{G} \mathbf{w} \\ \mathbf{0} \end{bmatrix} \quad \text{or} \quad \begin{bmatrix} \mathbf{0} & \mathbf{C}' \\ \mathbf{C} & \mathbf{P} \end{bmatrix} \begin{bmatrix} \mathbf{v} \\ \mathbf{b} \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ \mathbf{G} \mathbf{w} \end{bmatrix}. \tag{3.2}$$

In the latter case of Eq. (3.2), the solution is

$$
\begin{bmatrix} \mathbf{v} \\ \mathbf{b}\_{\mathcal{R}} \end{bmatrix} = \begin{bmatrix} \mathbf{0} & \mathbf{C}' \\ \mathbf{C} & \mathbf{P} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{0} \\ \mathbf{G} \mathbf{w} \end{bmatrix}, \tag{3.3}
$$

where 0 C<sup>0</sup> C P -1 is the inverse of matrix 0 C<sup>0</sup> C P and <sup>b</sup><sup>R</sup> is the RLPSI vector of coefficients. There is a mathematical algorithm (Searle 1966; Schott 2005) for finding matrix 0 C<sup>0</sup> C P -1 . It can be shown that

$$
\begin{bmatrix} \mathbf{0} & \mathbf{C}' \\ \mathbf{C} & \mathbf{P} \end{bmatrix}^{-1} = \begin{bmatrix} \left(-\mathbf{C}'\mathbf{P}^{-1}\mathbf{C}\right)^{-1} & \left(\mathbf{C}'\mathbf{P}^{-1}\mathbf{C}\right)^{-1}\mathbf{C}'\mathbf{P}^{-1} \\ \mathbf{P}^{-1}\mathbf{C}\left(\mathbf{C}'\mathbf{P}^{-1}\mathbf{C}\right)^{-1} & -\mathbf{P}^{-1}\mathbf{C}\left(\mathbf{C}'\mathbf{P}^{-1}\mathbf{C}\right)^{-1}\mathbf{C}'\mathbf{P}^{-1} + \mathbf{P}^{-1} \end{bmatrix}, \tag{3.4}
$$

whence the RLPSI vector of coefficients (bR) that minimizes <sup>E</sup>[(<sup>H</sup> - I) 2 ] and maximizes ρHI under the restriction C<sup>0</sup> b ¼ 0 can be written as

$$\mathbf{b}\_{\mathcal{R}} = \mathbf{K}\mathbf{b},\tag{3.5}$$

where K ¼ [I - Q], Q ¼ P-1 C(C<sup>0</sup> P-1 C) -1 C0 and b ¼ P-1 Gw; P-<sup>1</sup> is the inverse of matrix P and I is an identity matrix t t. When there are no restrictions on any traits, U0 is a null matrix and b<sup>R</sup> ¼ b ¼ P-1 Gw, the LPSI vector of coefficients. Thus, the RLPSI includes the LPSI as a particular case.

According to Eq. (3.5), the RLPSI can be written as

$$I\_R = \mathbf{b}'\_R \mathbf{y},\tag{3.6}$$

whereas the maximized correlation between the RLPSI and the net genetic merit is

$$\rho\_{Hl\_R} = \frac{\mathbf{w}' \mathbf{G} \mathbf{b}\_R}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w}} \sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}} \,\tag{3.7}$$

According to conditions for constructing a valid RLPSI, the index IR ¼ b<sup>0</sup> Ry should have normal distributions. Using 1 and 2 null restrictions, this assumption is illustrated in Fig. 3.1 for a real maize (Zea mays) F2 population with 247 lines and

Fig. 3.1 (a) and (b) show the distributions of 247 values of the restricted linear phenotypic selection index (RLPSI), with one and two restrictions respectively, constructed with the phenotypic means of four maize (Zea mays) F2 population traits: grain yield (ton ha-1 ), plant height (cm), ear height (cm), and anthesis day (days), evaluated in one environment

four traits—grain yield (ton ha-1 ); plant height (cm), ear height (cm), and anthesis day (days)—evaluated in one environment. Figure 3.1 indicates that, in effect, the RLPSI values approach normal distribution.

Under the null restrictions made by the breeder, IR ¼ b<sup>0</sup> <sup>R</sup>y should have maximum correlation with <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g and should be useful for ranking and selecting among individuals with different net genetic merit; however, ρHIR is lower than the correlation between LPSI and <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g (ρHI) in each selection cycle because when the restriction C<sup>0</sup> <sup>b</sup> <sup>¼</sup> <sup>0</sup> is imposed on the RLPSI vector of coefficients, the restricted traits do not affect the correlation ρHIR . Using simulated data described in Sect. 2.8.1 of Chap. 2, we estimated ρHIR and ρHI for seven selection cycles and compared the results in Fig. 3.2. Correlation ρHIR values were estimated for one, two, and three null restrictions and in effect, they were lower than the estimated values of ρHI in all selection cycles (Fig. 3.2). Additional results can be seen in Chap. 10, where the RLPSI was simulated for many selection cycles. Chapter 11 describes RIndSel: a program that uses R (in this case R denotes a platform for data analysis, see Kabakoff 2011 for details) and the selection index theory to select individual candidates for selection.

Fig. 3.2 Estimated correlation values between the linear phenotypic selection index (LPSI) and the net genetic merit (<sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g); estimated correlation values between the RLPSI and H for one (red), two (yellow), and three (green) restrictions for four traits and 500 genotypes in one environment simulated for seven selection cycles

The maximized RLPSI selection response and the restricted expected genetic gain per trait can be written as

$$R\_{\mathbb{R}} = k\_I \sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R} \tag{3.8}$$

and

$$\mathbf{E}\_R = k\_I \frac{\mathbf{G} \mathbf{b}\_R}{\sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}},\tag{3.9}$$

respectively, where kI is the standardized selection differential or selection intensity associated with the RLPSI.

The maximized RLPSI selection response has the same form as the maximized LPSI selection response; thus, under r restrictions, Eq. (3.8) predicts the mean improvement in H owing to indirect selection on IR ¼ b<sup>0</sup> <sup>R</sup>y when b<sup>R</sup> ¼ Kb. The restriction effects are observed on the RLPSI expected genetic gains per trait (Eq. 3.9) where each restricted trait has an expected genetic gain equal to zero. In addition, because the RLPSI selection response and expected genetic gain per trait values are also affected by the restricted traits, they are lower than the LPSI selection response and expected genetic gain per trait values.

#### 3.1.2 Statistical Properties of the RLPSI

Under the assumptions that <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g and IR ¼ b<sup>0</sup> <sup>R</sup>y have a bivariate joint normal distribution, b<sup>R</sup> ¼ Kb, b ¼ P-1 Gw, and P, G, and w are known, the RLPSI has the following properties:


:

$$\rho\_{HI\_R} = \frac{\mathbf{w}' \mathbf{G} \mathbf{b}\_R}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w}} \sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}} = \sqrt{\frac{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}{\mathbf{w}' \mathbf{G} \mathbf{w}}} = \frac{\sigma\_{I\_R}}{\sigma\_H}$$

8. The variance of the predicted error, Var Hð Þ¼ - IR 1 ρ<sup>2</sup> HIR σ<sup>2</sup> <sup>H</sup>, is minimal. By point 6 <sup>σ</sup>HIR <sup>¼</sup> <sup>σ</sup><sup>2</sup> IR , whence Var Hð Þ¼ - IR σ<sup>2</sup> <sup>H</sup> σ<sup>2</sup> IR ¼ 1 ρ<sup>2</sup> HIR σ<sup>2</sup> H. 9. RLPSI heritability is equal to h<sup>2</sup> IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>Gb<sup>R</sup> b0 <sup>R</sup>Pb<sup>R</sup> .

Points 1–4 show that in effect, the RLPSI projects the LPSI vector of coefficients into a space smaller than the original LPSI vector of coefficients. In addition, the RLPSI statistical properties denoted by points 5–9 are the same as the LPSI statistical properties. Thus, the RLPSI is a variant of the LPSI.

#### 3.1.3 The RLPSI Matrix of Restrictions

The main difference between the RLPSI and the LPSI is the restriction U<sup>0</sup> Gb ¼ 0 used to obtain the RLPSI vector of coefficients. This restriction is introduced through matrix U<sup>0</sup> (t -1) t, which is called matrix of null restrictions and is very important in an RLPSI context. The form and size of matrix U<sup>0</sup> depends on the number of restricted traits. For example, suppose that we restrict only one of t traits; then we can restrict the first of them as <sup>U</sup><sup>0</sup> <sup>¼</sup> ½ <sup>100</sup> <sup>0</sup> , the second as U<sup>0</sup> ¼ ½ 010 0 , the third as U<sup>0</sup> ¼ ½ 001 0 , etc. When we restrict two of t traits, matrix U<sup>0</sup> could be constructed as follows. We can restrict the first and second traits as <sup>U</sup><sup>0</sup> <sup>¼</sup> <sup>100</sup> <sup>0</sup> 010 0 , the first and third traits as

<sup>U</sup><sup>0</sup> <sup>¼</sup> <sup>100</sup> <sup>0</sup> 001 0 , the second and third traits as

<sup>U</sup><sup>0</sup> <sup>¼</sup> <sup>010</sup> <sup>0</sup> 001 0 , etc. If we restrict three of <sup>t</sup> traits, matrix <sup>U</sup><sup>0</sup> will have

the following form when the first, second, and third traits are restricted,

U<sup>0</sup> ¼ 1000 0 0100 0 0010 0 2 4 3 <sup>5</sup>; if the first, second, and fourth traits are restricted, 1000 0 2 3

U<sup>0</sup> ¼ 0100 0 0001 0 4 5, if the second, the third and the fourth traits are

restricted, U<sup>0</sup> ¼ 0100 0 0010 0 0001 0 2 4 3 5, etc. The procedure to construct matrix

U0 is valid for any number of restricted traits. 

There are <sup>X</sup><sup>t</sup> r¼0 t r <sup>¼</sup> <sup>2</sup><sup>t</sup> (Leon-Garcia 2008) possible forms for constructing matrix U<sup>0</sup> , where <sup>t</sup> r <sup>¼</sup> <sup>t</sup>! r!ð Þ t<sup>r</sup> ! and t ! ¼ t(t - 1)(t - 2)(t - 3)(t - (t - 1)). Note, however, that when <sup>r</sup> <sup>¼</sup> 0, <sup>U</sup><sup>0</sup> is a null matrix, and when r ¼ t, all traits are restricted and then the RLPSI values are null. Thus, the breeder should be interested only in 2<sup>t</sup> - 2 possible ways of constructing matrix U<sup>0</sup> .

#### 3.1.4 Numerical Examples

To illustrate the RLPSI theoretical results, we use the data set described in Sect. 2.8.1 of Chap. 2. We used that data set for seven phenotypic selection cycles (C1 to C7), each with four traits (T1, T2, T<sup>3</sup> and T4), 500 genotypes and four replicates for each genotype. The economic weights for T1, T2, T3, and T<sup>4</sup> were 1, -1, 1, and 1 respectively. The estimated phenotypic (Pb) and genetic (Gb) covariance matrices for traits T1, T2, T3, and T<sup>4</sup> obtained for the first selection cycle (C1) of the simulated data were

$$
\begin{aligned}
\widehat{\mathbf{P}} &= \begin{bmatrix}
62.50 & -12.74 & 8.53 & 2.73 \\
8.53 & -3.38 & 12.31 & 0.16 \\
2.73 & -2.28 & 0.16 & 7.27
\end{bmatrix} \quad \text{and} \\
\widehat{\mathbf{G}} &= \begin{bmatrix}
36.21 & -12.93 & 8.35 & 2.74 \\
8.35 & -3.40 & 9.96 & 0.16 \\
2.74 & -2.24 & 0.16 & 6.64
\end{bmatrix},
\end{aligned}
$$

respectively. We can restrict T<sup>1</sup> with matrix U<sup>0</sup> ¼ ½ 1000 ; T<sup>1</sup> and T<sup>2</sup> with matrix U<sup>0</sup> <sup>¼</sup> <sup>1000</sup> <sup>0100</sup> , and <sup>T</sup>1, <sup>T</sup><sup>2</sup> and <sup>T</sup><sup>3</sup> with matrix <sup>U</sup><sup>0</sup> ¼ .

Matrix C<sup>0</sup> ¼ U<sup>0</sup> Gb associated with U<sup>0</sup> 1, U<sup>0</sup> 2, and U<sup>0</sup> can be obtained as C0 ¼ U<sup>0</sup> G ¼ ½ 36:21 -:93 8:35 2:74 ,

$$\mathbf{C}'\_2 = \mathbf{U}'\_2 \hat{\mathbf{G}} = \begin{bmatrix} 36.21 & -12.93 & 8.35 & 2.74 \\ -12.93 & 13.04 & -3.40 & -2.24 \end{bmatrix}, \quad \text{and}$$

$$\mathbf{C}'\_3 = \mathbf{U}'\_3 \hat{\mathbf{G}} = \begin{bmatrix} 36.21 & -12.93 & 8.35 & 2.74 \\ -12.93 & 13.04 & -3.04 & -2.24 \\ 8.35 & -3.40 & 9.96 & 0.16 \end{bmatrix}.$$

The estimated LPSI vector of coefficients was bb<sup>0</sup> ¼ w<sup>0</sup> GbPb- <sup>¼</sup> ½ <sup>0</sup>:<sup>55</sup> -:05 1:09 1:06 .

The estimated matrices Qb ¼ Pb- C C0 Pb- C - C0 and <sup>K</sup><sup>b</sup> <sup>¼</sup> I<sup>4</sup> - <sup>Q</sup><sup>b</sup> (where I<sup>4</sup> is an identity matrix 4 4) for 1 null restriction, were

$$
\widehat{\mathbf{Q}}\_{1} = \widehat{\mathbf{P}}^{-1} \mathbf{C}\_{1} \left( \mathbf{C}\_{1}' \widehat{\mathbf{P}}^{-1} \mathbf{C}\_{1} \right)^{-1} \mathbf{C}\_{1}' = \begin{bmatrix} 0.72 & -0.26 & 0.17 & 0.05 \\ -0.51 & 0.18 & -0.12 & -0.04 \\ 0.39 & -0.14 & 0.09 & 0.03 \\ 0.14 & -0.05 & 0.03 & 0.01 \end{bmatrix} \quad \text{and} \quad
$$

$$
\widehat{\mathbf{K}}\_{1} = \begin{bmatrix} \mathbf{I}\_{4} - \widehat{\mathbf{Q}}\_{1} \end{bmatrix} = \begin{bmatrix} 0.28 & 0.26 & -0.17 & -0.05 \\ 0.51 & 0.82 & 0.12 & 0.04 \\ -0.39 & 0.14 & 0.91 & -0.03 \\ -0.14 & 0.05 & -0.03 & 0.99 \end{bmatrix}.
$$

Thus, the estimated RLPSI vector of coefficients was bb0 <sup>R</sup><sup>1</sup> <sup>¼</sup> Kb1bb ¼ - 0:35 -:41 0:59 0:89 , whence the estimated RLPSI for 1 null restriction can be written as bI <sup>R</sup><sup>1</sup> ¼ -:35T<sup>1</sup> - 0:41T<sup>2</sup> þ 0:59T<sup>3</sup> þ 0:89T4. The average values of T1, T2, T3, and T<sup>4</sup> were 164.46, 39.63, 34.66, and 23.11 (Table 3.1) respectively; then,


Table 3.1 Ten genotypes, mean values of four traits, and unranked and ranked values of the restricted linear phenotypic selection index (RLPSI) obtained from 500 simulated genotypes (each with four repetitions) and four traits (T1, T2, T3, and T4) in one environment for one selection cycle

#### bI <sup>R</sup><sup>1</sup> ¼ -0:35 164 ð Þ- :46 0:41 39 ð Þþ :63 0:59 34 ð Þþ :66 0:89 23 ð Þ¼-:11 33:24:

In Table 3.1 we present ten genotypes, the mean values of four traits, and the unranked and ranked values of the RLPSI from 500 genotypes in one environment simulated for one selection cycle. The first part of Table 3.1 presents the ten unranked genotypes, whereas the second part presents the ten genotypes ranked by the estimated RLPSI values.

Assuming a selection intensity of 10% (kI ¼ 1.755), the estimated selection response and the estimated expected genetic gain per trait for 1 null restriction were Rb<sup>R</sup><sup>1</sup> ¼ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 R1 Pbbb<sup>R</sup><sup>1</sup> q ¼ 6:87 and Eb<sup>0</sup> <sup>R</sup><sup>1</sup> ¼ 1:755 bb0 R1 Gb ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pbbb<sup>R</sup><sup>1</sup> <sup>q</sup> <sup>¼</sup> <sup>½</sup> <sup>0</sup> -2:2

2:03 2:66, respectively, and the estimated correlation between the RLPSI and the net genetic merit was <sup>b</sup>ρHIR<sup>1</sup> ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pbbb<sup>R</sup><sup>1</sup> w0 Gwb s ¼ 0:35.

In a similar manner to that for 1 null restriction, it is possible to obtain the estimated matrices Qb ¼ Pb-1 C C0 Pb-1 C -1 <sup>C</sup><sup>0</sup> and <sup>K</sup><sup>b</sup> <sup>¼</sup> I<sup>4</sup> - <sup>Q</sup><sup>b</sup> , and the estimated RLPSI vector of coefficients for 2 and 3 null restrictions. Thus, for 2 and 3 null restrictions, the estimated selection responses were RbR<sup>2</sup> ¼ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 R2 PbbbR<sup>2</sup> q ¼ 5:54 and RbR<sup>3</sup> ¼ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 R3 PbbbR<sup>3</sup> q ¼ 4:12 respectively, whereas the estimated expected genetic gains per trait were Eb<sup>0</sup> <sup>R</sup><sup>2</sup> ¼ 1:755 bb0 R2 G ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>2PbbbR<sup>2</sup> q ¼ bb0 R3 G

$$\begin{bmatrix} 0 & 0 & 2.773 & 2.768 \end{bmatrix} \quad \text{and} \begin{aligned} \mathbf{\hat{E}}'\_{R\_3} &= 1.755 \frac{\mathbf{u}\_{R\_3} \mathbf{u}}{\sqrt{\mathbf{\hat{b}}'\_{R\_3} \hat{\mathbf{P}} \mathbf{\hat{b}}\_{R\_3}}} = [0 \quad 0 \quad 0 & 4.12 \,] . \end{aligned}$$

Note that the estimated RLPS selection response decreased when the number of restrictions increased. Also, the number of zeros in the expected genetic gain per trait increased from 1 to 3 depending on the number of null restrictions. The same is true for the estimated correlation between the RLPSI and the net genetic merit (Fig. 3.2).

Table 3.2 presents the estimated LPSI selection response and its heritabilities, and the estimated RLPSI selection response and its heritabilities for 1, 2, and 3 null restrictions for seven simulated selection cycles using a selection intensity of 10% (kI ¼ 1.755). Note that the averages of the estimated RLPSI selection response for the seven selection cycles were 6.76, 5.30, and 3.70 for 1, 2, and 3 null restrictions respectively, and that 3.70, the average value for 3 null restrictions, is only 54.73% of the average value for 1 null restriction (6.76). However, the estimated RLPSI heritabilities for 1, 2, and 3 null restrictions tend to increase. This is because the simulated true heritabilities of traits T1, T2, T3, and T<sup>4</sup> were 0.4, 0.6, 0.6, and 0.8 respectively, whereas the averages of the estimated heritabilities of traits T1, T2, T3, and T<sup>4</sup> were 0.70, 0.78, and 0.87 for 1, 2, and 3 null restrictions respectively.

Table 3.3 presents the estimated LPSI expected genetic gain per trait and the estimated RLPSI expected genetic gain per trait for 1, 2, and 3 null restrictions for


Table 3.2 Estimated linear phenotypic selection index (LPSI) selection response and its heritability, and estimated restricted LPSI (RLPSI) selection response and its heritability for one, two, and three null restrictions for seven simulated selection cycles

The selection intensity was 10% (kI ¼ 1.755)


Table 3.3 Estimated LPSI expected genetic gain per trait, and estimated RLPSI expected genetic gain per trait for one, two, and three null restrictions for seven simulated selection cycles

The selection intensity was 10% (kI ¼ 1.755)

seven simulated selection cycles using a selection intensity of 10% (kI ¼ 1.755). In effect, due to the restriction C<sup>0</sup> b ¼ 0, matrix K projects b into a space smaller than the original space of b and the space reduction into which matrix K projects b is equal to the number of zeros that appear in the RLPSI expected genetic gain per trait.

It can be shown that in the three restrictions case (Table 3.3) the estimated RLPSI expected genetic gain pert traits (or multi-trait selection response) is equal to the one trait selection response (Eqs. 2.4 and 2.5) when only trait T4 is selected. This means that in effect, when we imposed three restriction over the RLPSI expected genetic gains pert trait, we reduced one space of four dimensions to one space of only one dimension.

#### 3.2 The Predetermined Proportional Gains Linear Phenotypic Selection Index

This index is called the predetermined proportional gains phenotypic selection index (PPG-LPSI) because the breeder pre-sets optimal levels for certain traits before the selection is carried out. The conditions for constructing a valid PPG-LPSI are the same as those described for the LPSI in Sect. 2.1 of Chap. 2. Some of the main objectives of the PPG-LPSI are to optimize the expected genetic gain per trait, predict the net genetic merit <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g, and select the individuals with the highest net genetic merit values as parents of the next generation. The PPG-LPSI allows restrictions different from zero to be imposed on the expected genetic gains of some traits, whereas other traits increase (or decrease) their expected genetic gains without imposing any restrictions. The PPG-LPSI solves the LPSI equations subject to the condition that the covariance between the LPSI and some linear functions of the genotypes involved be equal to a vector of predetermined constants or genetic gains defined by the breeder (Cunningham et al. 1970).

Let d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> dr ½ be a vector r 1 of the predetermined proportional gains and assume that μ<sup>q</sup> is the population mean of the qth trait before selection. One objective could be to change μ<sup>q</sup> to μ<sup>q</sup> + dq, where dq is a predetermined change in μ<sup>q</sup> (in the RLPSI, dq ¼ 0, q ¼ 1, 2, , r, where r is the number of predetermined proportional gains). We can solve this problem in a similar manner to that used with the RLPSI. That is, minimizing the mean squared difference between I and H(E [(H - I) 2 ]) under the restriction D<sup>0</sup> U0 Gb ¼ 0, where D<sup>0</sup> ¼ dr 0 0 d1 0 dr 0 d2 ⋮⋮⋱⋮ ⋮ 0 0 dr dr-1 2 6 6 4 3 7 7 5 is a Mallard (1972) matrix (r -1) r of

predetermined proportional gains, dq (<sup>q</sup> <sup>¼</sup> 1, 2..., <sup>r</sup>) is the <sup>q</sup>th element of vector d 0 , U<sup>0</sup> is the RLPSI matrix of restrictions of 1's and 0's described earlier in this chapter, G is the covariance matrix of genotypic values, and b is the LPSI vector of coefficients. Also, it is possible to minimize <sup>E</sup>[(<sup>H</sup> - I) 2 ] under the restriction U0 Gb ¼ θd (Tallis 1985), where θ is a proportionality constant, which is a scalar to be determined a posteriori (Lin 2005), that is, θ is indeterminate a priori (Itoh and Yamada 1987). Both approaches are very similar but the equations obtained when introducing the D<sup>0</sup> U0 Gb <sup>¼</sup> <sup>0</sup> restriction are simpler than when introducing <sup>U</sup><sup>0</sup> Gb ¼ θd restrictions into the process of minimizing E[(H - I) 2 ]. The D<sup>0</sup> U0 Gb ¼ 0 restriction leads to a set of equations similar to Eq. (3.5) whereas the U<sup>0</sup> Gb ¼ θd restriction leads to a set of equations that are difficult to solve.

#### 3.2.1 The Maximized PPG-LPSI Parameters

Let M<sup>0</sup> <sup>¼</sup> <sup>D</sup><sup>0</sup> C0 be the Mallard (1972) matrix of predetermined restrictions, where C0 <sup>¼</sup> <sup>U</sup><sup>0</sup> G. Under the restriction M<sup>0</sup> b ¼ 0, we can minimize E[(I - H) 2 ], assuming that P, G, U<sup>0</sup> , D<sup>0</sup> , and w are known; that is, we need to minimize the function

$$\Phi(\mathbf{b}, \mathbf{v}) = \mathbf{b}' \mathbf{P} \mathbf{b} + \mathbf{w}' \mathbf{G} \mathbf{w} - 2\mathbf{w}' \mathbf{G} \mathbf{b} + 2\mathbf{v}' \mathbf{M}' \mathbf{b} \tag{3.10}$$

with respect to vectors b and v<sup>0</sup> ¼ ½ v<sup>1</sup> v<sup>2</sup> vr-<sup>1</sup> , where v is a vector of Lagrange multipliers. Note that the only difference between Eqs. (3.1) and (3.10) is matrix D<sup>0</sup> and that matrix M<sup>0</sup> <sup>¼</sup> <sup>D</sup><sup>0</sup> C0 has the same function in Eq. (3.10) that matrix C0 <sup>¼</sup> <sup>U</sup><sup>0</sup> G had in Eq. (3.1). Then, the derivative results of Eq. (3.10) from b and v should be similar to those of Eq. (3.1), i.e.,

$$
\begin{bmatrix}
\mathbf{P} & \mathbf{M} \\
\mathbf{M'} & \mathbf{0}
\end{bmatrix}
\begin{bmatrix}
\mathbf{b} \\
\mathbf{v}
\end{bmatrix} = \begin{bmatrix}
\mathbf{G}\mathbf{w} \\
\mathbf{0}
\end{bmatrix}
$$

whence the vector that minimizes E[(H - I) 2 ] under the restriction M<sup>0</sup> b ¼ 0 is

$$\mathbf{b}\_M = \mathbf{K}\_M \mathbf{b},\tag{3.11}$$

where K<sup>M</sup> ¼ [I<sup>t</sup> - QM], Q<sup>M</sup> ¼ P-1 M(M<sup>0</sup> P-1 M) -1 M0 ¼ P-1 CD(D<sup>0</sup> C0 P-1 CD) -1 D0 C0 , and I<sup>t</sup> is an identity matrix of size t t. When D ¼ U, b<sup>M</sup> ¼ b<sup>R</sup> (the RLPSI vector of coefficients), and when <sup>D</sup> <sup>¼</sup> <sup>U</sup> and <sup>U</sup><sup>0</sup> is a null matrix, b<sup>M</sup> ¼ b (the LPSI vector of coefficients). Thus, the Mallard (1972) index is more general than the RLPSI and is an optimal PPG-LPSI. In addition, it includes the LPSI and the RLPSI as particular cases.

Instead of using restriction M<sup>0</sup> b ¼ 0 to minimize E[(I - H) 2 ], we can use restriction C<sup>0</sup> b ¼ θd and minimize

$$\Phi\_T(\mathbf{b}, \mathbf{v}) = \mathbf{b}' \mathbf{P} \mathbf{b} + \mathbf{w}' \mathbf{G} \mathbf{w} - 2\mathbf{w}' \mathbf{G} \mathbf{b} + 2\mathbf{v}'(\mathbf{C}' \mathbf{b} - \Theta \mathbf{d}) \tag{3.12}$$

with respect to b, v 0 , and θ (Tallis 1985; Lin 2005) assuming that P, G, U<sup>0</sup> , d, and w are known. The derivative results in matrix notation are

$$
\begin{bmatrix}
\mathbf{b}\_T \\
\mathbf{v} \\
\mathbf{0}
\end{bmatrix} = \begin{bmatrix}
\mathbf{P} & \mathbf{C} & \mathbf{0}\_{t \times 1} \\
\mathbf{C}' & \mathbf{0}\_{r \times t} & -\mathbf{d} \\
\mathbf{0}'\_{1 \times t} & -\mathbf{d}' & \mathbf{0}
\end{bmatrix}^{-1} \begin{bmatrix}
\mathbf{G}\mathbf{w} \\
\mathbf{0} \\
\mathbf{0}
\end{bmatrix},\tag{3.13}
$$

where 0<sup>t</sup> <sup>1</sup> is a null vector t 1, 0<sup>r</sup> <sup>t</sup> is a null matrix r t, and 0 is a null column vector (r - 1) 1; 0 is the standard zero value. The inverse matrix of coefficients P C0<sup>t</sup><sup>1</sup> 2 3 -1

C<sup>0</sup> 0<sup>r</sup><sup>t</sup> d 00 <sup>1</sup><sup>t</sup> d<sup>0</sup> 0 4 5 in Eq. (3.13) is not easy to obtain; for this reason, Tallis

(1985) obtained his results in two steps. That is, Tallis (1985) first derived Eq. (3.12) with respect to b and v 0 , whence he obtained

$$\mathbf{b}\_{\rm T} = \mathbf{b}\_{\rm R} + \boldsymbol{\Theta}\boldsymbol{\mathfrak{G}},\tag{3.14}$$

where b<sup>R</sup> ¼ Kb (Eq. 3.5), δ ¼ P-1 C(C<sup>0</sup> P-1 C) -1 d, and d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> dr ½ . Next, he derived E b<sup>0</sup> <sup>T</sup> y -<sup>H</sup> <sup>2</sup> h i only with respect to <sup>θ</sup>, and his result was

$$\theta = \frac{\mathbf{b}^{\prime}\mathbf{C}\left(\mathbf{C}^{\prime}\mathbf{P}^{-1}\mathbf{C}\right)^{-1}\mathbf{d}}{\mathbf{d}^{\prime}\left(\mathbf{C}^{\prime}\mathbf{P}^{-1}\mathbf{C}\right)^{-1}\mathbf{d}},\tag{3.15}$$

where b ¼ P-1 Gw is the LPSI vector of coefficients, C<sup>0</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> G, d is the vector of the predetermined proportional gains imposed by the breeder and P-<sup>1</sup> is the inverse of matrix <sup>P</sup>. When <sup>θ</sup> <sup>¼</sup> 0, <sup>b</sup><sup>T</sup> <sup>¼</sup> <sup>b</sup>R, and if <sup>θ</sup> <sup>¼</sup> 0 and <sup>U</sup><sup>0</sup> is the null matrix, b<sup>T</sup> ¼ b. That is, the PPG-LPSI obtained by Tallis (1985) is more general than the RLPSI and the LPSI. The foregoing results indicate that Eq. (3.14) consists of three parts:


When θ ¼ 1, Eq. (3.14) is equal to

$$\mathbf{b}\_{\rm T\_0} = \mathbf{b}\_R + \mathbf{\delta}.\tag{3.16}$$

The latter equation was the original result obtained by Tallis (1962). Tallis (1962) derived Eq. (3.12) with respect to vectors b and v under the restriction U<sup>0</sup> Gb ¼ d, i.e., without θ or θ ¼ 1. Later, James (1968) maximized the correlation between I and H(ρHI) under the Tallis (1962) restriction and once more obtained Eq. (3.16). Mallard (1972) showed that Eq. (3.16) is not optimal, i.e., it does not minimize E [(I - H) 2 ] and does not maximize ρHI, and gave the optimal solution, which we have presented here in Eq. (3.11). Later, using restriction U<sup>0</sup> Gb ¼ θd, Tallis (1985) obtained Eq. (3.14), which also is optimal.

Figure 3.3 presents the estimated correlation values between PPG-LPSI and the net genetic merit (<sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g) for the optimal PPG-LPSI (Eq. 3.14) and non-optimal PPG-LPSI (Eq. 3.16) using one (d<sup>1</sup> ¼ 7), two (d<sup>0</sup> ¼ ½ 7 -3 ), and three (d<sup>0</sup> ¼ ½ 7 -3 5 ) predetermined restrictions, four traits and 500 simulated genotypes in

Fig. 3.3 Estimated correlation values between the predetermined proportional gain linear phenotypic selection index (PPG-LPSI) and the net genetic merit (<sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g) for the optimal and non-optimal PPG-LPSI using 1 (d<sup>1</sup> ¼ 7), 2 (d<sup>0</sup> ¼ ½ 7 -3 ) and 3 (d<sup>0</sup> ¼ ½ 7 -3 5 ) predetermined restrictions, 4 traits and 500 simulated genotypes in 1 environment for 7 selection cycles

one environment for seven selection cycles (see Sect. 2.8.1 of Chap. 2). Note that in effect, the non-optimal PPG-LPSI has lower correlations than the optimal PPG-LPSI for the seven simulated selection cycles.

Let <sup>b</sup><sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>M</sup> <sup>¼</sup> <sup>b</sup><sup>T</sup> be the PPG-LPSI vector of coefficients. Then, the optimal PPG-LPSI can be written as

$$I\_P = \mathbf{b}\_P' \mathbf{y},\tag{3.17}$$

whereas the maximized correlation between the PPG-LPSI and the net genetic merit is

$$\rho\_{Hl\_P} = \frac{\mathbf{w}' \mathbf{G} \mathbf{b}\_P}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w}} \sqrt{\mathbf{b}\_P' \mathbf{P} \mathbf{b}\_P}} \,. \tag{3.18}$$

According to the conditions for constructing a valid PPG-LPSI described in Sect. 2.1 of Chap. 2, the index IP ¼ b<sup>0</sup> <sup>P</sup>y should have normal distributions. Figure 3.4 presents the distribution of 500 estimated PPG-LPSI values with two (d<sup>0</sup> ¼ ½ 7 -3 ) and three (d<sup>0</sup> ¼ ½ 7 -3 5 ) predetermined restrictions respectively, obtained from one selection cycle, with four traits and 500 genotypes simulated in one environment

Fig. 3.4 (a) and (b) show the distribution of 500 estimated predetermined proportional gain linear phenotypic selection index values with two ( d<sup>0</sup> ¼ ½ 7 -3 ) and three ( d<sup>0</sup> ¼ ½ 7 -3 5 ) predetermined restrictions respectively, obtained from one selection cycle for 500 genotypes and four traits simulated in one environment

(see Chap. 2, Sect. 2.8.1 for details). Figure 3.4 indicates that, in effect, the PPG-LPSI values approach normal distribution.

Under the predetermined restrictions imposed by the breeder, IP ¼ b<sup>0</sup> <sup>P</sup>y should have maximal correlation with <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g and it should be useful for ranking and selecting among individuals with different net genetic merits. However, for more than two restrictions the proportionality constant (θ) could be lower than 1; in that case, <sup>ρ</sup>HIP is lower than the correlation between LPSI and <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g (ρHI). In addition, when the restriction M<sup>0</sup> <sup>b</sup> <sup>¼</sup> <sup>0</sup> or <sup>U</sup><sup>0</sup> Gb ¼ θd is imposed on the PPG-LPSI vector of coefficients, the restricted traits decrease their effect on the correlation between PPG-LPSI and <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g. Using the simulated data set described in Sect. 2.8.1 of Chap. 2, we estimated ρHIP and ρHI for seven selection cycles and compared the results in Fig. 3.5. Correlation ρHIP values were estimated using one (d<sup>1</sup> ¼ 7), two ( d<sup>0</sup> ¼ ½ 7 -3 ), and three ( d<sup>0</sup> ¼ ½ 7 -3 5 ) predetermined restrictions. Figure 3.5 indicates that when the number of predetermined restrictions is equal to or higher than two, the estimated values of ρHIP decrease more than when only one predetermined restriction is imposed on the PPG-LPSI.

The maximized PPG-LPSI selection response and expected genetic gains per trait can be written as

$$R\_P = k\_I \sqrt{\mathbf{b}\_M' \mathbf{P} \mathbf{b}\_M} = k\_I \sqrt{\mathbf{b}\_T' \mathbf{P} \mathbf{b}\_T} \tag{3.19}$$

and

$$\mathbf{E}\_{\rm P} = k\_I \frac{\mathbf{G} \mathbf{b}\_M}{\sqrt{\mathbf{b}\_M^{\prime} \mathbf{P} \mathbf{b}\_M}} = k\_I \frac{\mathbf{G} \mathbf{b}\_T}{\sqrt{\mathbf{b}\_T^{\prime} \mathbf{P} \mathbf{b}\_T}},\tag{3.20}$$

respectively, where kI is the standardized selection differential or selection intensity associated with the PPG-LPSI.

The maximized PPG-LPS selection response (Eq. 3.19) has the same form as the maximized LPSI selection response. Thus, under r predetermined restrictions, Eq. (3.19) predicts the mean improvement in H due to indirect selection on IP ¼ b<sup>0</sup> <sup>P</sup>y. Predetermined restriction effects are observed on the PPG-LPSI expected genetic gain per trait (Eq. 3.20). The main difference between the RLPSI and the PPG-LPSI is the vector of predetermined proportional gains.

#### 3.2.2 Statistical Properties of the PPG-LPSI

Assuming that <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g and IP ¼ b<sup>0</sup> <sup>P</sup>y have a bivariate joint normal distribution, b<sup>P</sup> ¼ KMb, b ¼ P-1 Gw, and P, G and w are known, the PPG-LPSI has the same properties as the RLPSI. Some of the main PPG-LPSI properties are:


$$
\sigma\_{I\_P}^2 = \mathbf{b}\_P^\prime \mathbf{P} \mathbf{b}\_P = \mathbf{b}^\prime \mathbf{K}\_M^\prime \mathbf{P} \mathbf{K}\_M \mathbf{b} = \mathbf{b}^\prime \mathbf{P} \mathbf{K}\_M^2 \mathbf{b} = \mathbf{b}^\prime \mathbf{P} \mathbf{K}\_M \mathbf{b} = \mathbf{w}^\prime \mathbf{G} \mathbf{b}\_P = \sigma\_{H \mathbf{b}\_P}.
$$

5. The maximized correlation between H and IP ¼ b<sup>0</sup> <sup>P</sup><sup>y</sup> is equal to <sup>ρ</sup>HIP <sup>¼</sup> <sup>σ</sup>IP <sup>σ</sup><sup>H</sup> . In point 4 of this subsection, we showed that <sup>σ</sup>HIP <sup>¼</sup> <sup>σ</sup><sup>2</sup> IP , then

:

$$\rho\_{HI\rho} = \frac{\mathbf{w}' \mathbf{G} \mathbf{b}\_P}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w}} \sqrt{\mathbf{b}\_P' \mathbf{P} \mathbf{b}\_P}} = \sqrt{\frac{\mathbf{b}\_P' \mathbf{P} \mathbf{b}\_P}{\mathbf{w}' \mathbf{G} \mathbf{w}}} = \frac{\sigma\_{I\rho}}{\sigma\_H}$$

6. The variance of the predicted error, Var Hð Þ¼ - IP 1 ρ<sup>2</sup> HIP σ<sup>2</sup> <sup>H</sup>, is minimal. By point 4 of this subsection, <sup>σ</sup>HIP <sup>¼</sup> <sup>σ</sup><sup>2</sup> IP , then Var Hð Þ¼ - IR σ<sup>2</sup> <sup>H</sup> σ<sup>2</sup> IP ¼ 1 ρ<sup>2</sup> HIP σ<sup>2</sup> H.

7. The heritability of the PPG-LPSI is equal to h<sup>2</sup> IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup>Gb<sup>P</sup> b0 <sup>P</sup>Pb<sup>P</sup> .

Points 1–3 show that in effect, the PPG-LPSI projects the LPSI vector of coefficients into a different space than the original LPSI vector of coefficients. In addition, the PPG-LPSI statistical properties denoted by points 4–7 are the same as the LPSI statistical properties. Thus, the PPG-LPSI is a variant of the LPSI.

#### 3.2.3 There Is Only One Optimal PPG-LGSI

Let <sup>S</sup> <sup>¼</sup> <sup>C</sup><sup>0</sup> P-1 C, under the restriction D<sup>0</sup> d ¼ 0, Itoh and Yamada (1987) showed that D(D<sup>0</sup> SD) -1 D0 ¼ S-1 - S-1 d(d 0 S-1 d) -1 d 0 S-1 , whence substituting S-1 - S-1 d(d 0 - S-1 d) -1 d 0 S-<sup>1</sup> for D(D<sup>0</sup> SD) -1 D0 in matrix QM, Eq. (3.11) can be written as Eq. (3.14), i.e., <sup>b</sup><sup>M</sup> <sup>¼</sup> <sup>b</sup>T. Therefore, the Mallard (1972) and Tallis (1985) vectors of coefficients are the same. In addition, Itoh and Yamada (1987) showed that the Harville (1975) vector of coefficients can written as <sup>b</sup><sup>T</sup> σIT (Eq. 2.21d), where σIT is the standard deviation of the variance of the Tallis (1985) PPG-LPSI. Thus, in reality, there is only one optimal PPG-LPSI.

Itoh and Yamada (1987) also pointed out that matrix D<sup>0</sup> ¼ dr 0 0 d1 0 dr 0 d2 ⋮⋮⋱⋮ ⋮ 0 0 dr dr-1 2 6 6 4 3 7 7 5 is only one example of several possible

Mallard (1972) D<sup>0</sup> matrices. They showed that any matrix D<sup>0</sup> that satisfies condition D0 d ¼ 0 is another Mallard (1972) matrix of predetermined proportional gains. According to Itoh and Yamada (1987), matrices

$$\mathbf{D}' = \begin{bmatrix} d\_2 & -d\_1 & 0 & \cdots & 0 & 0\\ 0 & d\_3 & -d\_2 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & 0\\ 0 & 0 & 0 & 0 & d\_r & d\_{r-1} \end{bmatrix} \quad \text{and} \quad$$

$$\mathbf{D}' = \begin{bmatrix} d\_2 & -d\_1 & 0 & \cdots & 0\\ d\_3 & 0 & -d\_1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ d\_r & 0 & 0 & 0 & -d\_1 \end{bmatrix}$$

are also Mallard (1972) matrices of predetermined proportional gains because they

$$\text{satisfying condition } \mathbf{D}^{\prime} \mathbf{d} = \mathbf{0}. \text{ However, matrix } \mathbf{D}^{\prime} = \begin{bmatrix} d\_r & 0 & \cdots & 0 & -d\_1 \\ 0 & d\_r & \cdots & 0 & -d\_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & d\_r & -d\_{r-1} \end{bmatrix} \text{ is}$$

"easier" to construct.

Harville (1975) maximized the correlation between I and H (ρIH) under the restriction C<sup>0</sup> <sup>b</sup> <sup>¼</sup> <sup>θ</sup><sup>d</sup> and was the first to point out the importance of the proportionality constant (θ) in the PPG-LPSI. Mallard (1972) showed that the restriction U<sup>0</sup> Gb ¼ d does not maximize the correlation with the net genetic merit (<sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g) and Harville (1975) indicated that the restriction U<sup>0</sup> Gb ¼ d only changes the sign of the genetic expected gain (or multi-trait selection response) but does not maximize the correlation between I ¼ b 0 <sup>y</sup> and <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g. According to Mallard (1972), Harville (1975), and Tallis (1985), the PPG-LPSI is optimal only under the restriction U<sup>0</sup> Gb ¼ θd.

Itoh and Yamada (1987) pointed out several problems associated with the Tallis (1985) PPG-PSI:


Itoh and Yamada (1987) thought that one possible solution to those problems could be to use the linear phenotypic selection index with desired gains.

#### 3.2.4 Numerical Examples

The estimated phenotypic (Pb) and genetic (Gb) covariance matrices described in Sect. 3.1.4 of this chapter for RLPSI are used as the first example. First, Eq. (3.11) is described to obtain the PPG-LPSI vector of coefficients. Let d<sup>0</sup> <sup>2</sup> ¼ ½ 7 -3 be the vector for 2 predetermined restrictions, then, the Mallard (1972) matrix is D<sup>0</sup> ¼ -½ 3 -7 , while matrix U<sup>0</sup> is U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>1000</sup> <sup>0100</sup> . Matrix <sup>M</sup><sup>0</sup> <sup>¼</sup> <sup>D</sup><sup>0</sup> U0 Gb for 2 predetermined restrictions will be M<sup>0</sup> ¼ D<sup>0</sup> U0 <sup>2</sup>Gb ¼ -½ 18:12 -52:49 -1:25 7:46 , whence

$$
\widehat{\mathbf{Q}}\_M = \widehat{\mathbf{P}}^{-1} \mathbf{M} \left( \mathbf{M}' \widehat{\mathbf{P}}^{-1} \mathbf{M} \right)^{-1} \mathbf{M}' = \begin{bmatrix} 0.084 & 0.242 & 0.006 & -0.034 \\ 0.313 & 0.906 & 0.022 & -0.129 \\ 0.037 & 0.106 & 0.003 & -0.015 \\ -0.019 & -0.055 & -0.001 & 0.008 \end{bmatrix} \quad \text{and} \quad
$$

$$
\widehat{\mathbf{K}}\_M = \begin{bmatrix} \mathbf{I}\_4 - \widehat{\mathbf{Q}}\_M \end{bmatrix} = \begin{bmatrix} 0.916 & -0.242 & -0.006 & 0.034 \\ -0.313 & 0.094 & -0.022 & 0.129 \\ -0.037 & 0.106 & 0.997 & 0.015 \\ 0.019 & 0.055 & 0.001 & 0.992 \end{bmatrix};$$

I<sup>4</sup> is an identity matrix of size 4 4.

The estimated LPSI and PPG-LPSI vectors of coefficients were bb<sup>0</sup> ¼ ½ 0:554 -1:053 1:090 1:058 and bb<sup>0</sup> <sup>M</sup> <sup>¼</sup> Kb <sup>M</sup>bb 0 ¼ ½ 0:793 -0:159 1:1941:004 respectively, and the estimated PPG-LPSI was bI <sup>M</sup> ¼ 0:793T1- 0:159T<sup>2</sup> þ 1:194T<sup>3</sup> þ 1:004T4. The standard deviation of the estimated variance of <sup>b</sup><sup>I</sup> <sup>M</sup> was <sup>σ</sup>bIM <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>Pbbb<sup>M</sup> q ¼ 9:526, whereas the estimated correlation value between the PPG-LPSI and the net genetic merit was <sup>b</sup>ρHIP <sup>¼</sup> <sup>σ</sup>bIM σbH <sup>¼</sup> <sup>0</sup>:85, where <sup>σ</sup>b<sup>H</sup> ¼ ffiffiffiffiffiffiffiffiffiffiffi wGw<sup>b</sup> <sup>p</sup> ¼ 11:202 is the estimated standard deviation of the variance of the net genetic merit.

Suppose that the selection intensity was 10% (kI ¼ 1.755); then, the estimated PPG-LPSI expected genetic gain per trait and the estimated selection response are Eb0 <sup>M</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup>G ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>Pbbb<sup>M</sup> <sup>q</sup> <sup>¼</sup> ½ <sup>8</sup>:<sup>013</sup> -3:434 3:541 1:730 and Rb<sup>M</sup> ¼ ð Þ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q

bb0 <sup>M</sup>Pbbb<sup>M</sup> ¼ ð Þ 1:755 ð Þ¼ 9:526 16:717 respectively.

Now, let d<sup>0</sup> <sup>3</sup> ¼ ½ 7 -3 5 be the vector for three predetermined restrictions, then there are three possible predetermined Mallard matrices, i.e., D0 <sup>1</sup> <sup>¼</sup> 5 0 -7 05 3 , <sup>D</sup><sup>0</sup> <sup>2</sup> <sup>¼</sup> -3 -7 0 0 53 , and <sup>D</sup><sup>0</sup> <sup>3</sup> <sup>¼</sup> -3 -7 0 5 0 -7 , and 1000 2 3

matrix U<sup>0</sup> for three restrictions is U<sup>0</sup> <sup>3</sup> ¼ 0100 0010 4 5. Thus, for three

predetermined restrictions matrix M<sup>0</sup> ¼ D<sup>0</sup> U0 Gb shall have three possible forms,

$$\begin{aligned} \text{i.e.}, \qquad \mathbf{M}\_1' &= \mathbf{D}\_1' \mathbf{U}\_3' \hat{\mathbf{G}} = \begin{bmatrix} 122.60 & -40.85 & -27.97 & 12.58 \\ -39.60 & 55.00 & 12.88 & -10.72 \end{bmatrix}, \qquad \text{but} \qquad \mathbf{M}\_2' = \begin{bmatrix} \mathbf{M}\_1' \mathbf{U}\_3' \end{bmatrix} \\ \mathbf{M}\_3' &= \mathbf{D}\_2' \mathbf{U}\_3' \hat{\mathbf{G}} = \mathbf{M}\_3' = \begin{bmatrix} -18.12 & -52.49 & -1.25 & 7.46 \\ -1.25 & -1.25 & -1.25 \end{bmatrix}, \end{aligned}$$

$$\mathbf{M}'\_2 = \mathbf{D}'\_2 \mathbf{U}'\_3 \hat{\mathbf{G}} = \mathbf{M}'\_3 = \begin{bmatrix} -18.12 & -52.49 & -1.25 & 7.46 \\ 122.60 & -40.85 & -27.97 & 12.58 \end{bmatrix}. \text{ Note that matrix}$$

M<sup>0</sup> <sup>1</sup> is different from matrices M<sup>0</sup> <sup>2</sup> and M<sup>0</sup> 3, and that the two latter are the same; however, both matrices should lead to the same estimated PPG-LPSI vector of coefficients and to the same estimated PPG-LPSI expected genetic gain per trait and selection response. It can be shown that for matrices M<sup>0</sup> 1, M<sup>0</sup> 2, and M<sup>0</sup> 3, matrices <sup>Q</sup><sup>b</sup> <sup>M</sup> and <sup>K</sup><sup>b</sup> <sup>M</sup> <sup>¼</sup> I<sup>4</sup> - Qb <sup>M</sup> are the same and can be written as

$$
\begin{aligned}
\widehat{\mathbf{Q}}\_M &= \begin{bmatrix}
0.771 & 0.080 & -0.145 & 0.026 \\
0.123 & 0.951 & 0.063 & -0.145 \\
0.118 & -0.087 & -0.031 & 0.020
\end{bmatrix} \quad \text{and} \\
\widehat{\mathbf{K}}\_M &= \begin{bmatrix}
0.229 & -0.080 & 0.145 & -0.026 \\
1.131 & -0.382 & 0.742 & 0.117 \\
\end{bmatrix}.
\end{aligned}
$$

The estimated LPSI vector of coefficients was equal to bb<sup>0</sup> ¼ ½ 0:554 -1:053 1:090 1:058 , whereas the estimated PPG-LPSI vector of coefficients was <sup>b</sup>b<sup>0</sup> <sup>M</sup> <sup>¼</sup> <sup>K</sup><sup>b</sup> <sup>M</sup>bb<sup>0</sup> ¼ ½ 0:342 -0:035 1:960 0:914 . The estimated PPG-LPSI was bI <sup>M</sup> ¼ 0312T<sup>1</sup> - 0:035T<sup>2</sup> þ 1:960T<sup>3</sup> þ 0:914T<sup>4</sup> and the standard deviation of the estimated variance of <sup>b</sup><sup>I</sup> <sup>M</sup> was <sup>σ</sup>bIM <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>Pbbb<sup>M</sup> q ¼ 8:68. The estimated correlation value between the PPG-LPSI and the net genetic merit was <sup>b</sup>ρHIP <sup>¼</sup> <sup>σ</sup>bIM σbH <sup>¼</sup> <sup>0</sup>:775, where <sup>σ</sup>b<sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffi wGw<sup>b</sup> <sup>p</sup> ¼ 11:202 is the estimated standard deviation of the variance of the net genetic merit.

Using a selection intensity of 10% (kI ¼ 1.755), the estimated PPG-LPSI expected genetic gain per trait and the estimated selection response were Eb0 <sup>M</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup>G ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>Pbbb<sup>M</sup> <sup>q</sup> <sup>¼</sup> ½ <sup>6</sup>:<sup>410</sup> -2:747 4:579 1:496 and Rb<sup>M</sup> ¼ ð Þ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>Pbbb<sup>M</sup> q

¼ ð Þ 1:755 ð Þ¼ 8:68 15:32, respectively:

According to Eq. (3.14), the estimated Tallis (1985) vector of coefficients can be obtained as bb<sup>T</sup> ¼ bb<sup>R</sup> þ bθbδ, where bb<sup>R</sup> ¼ Kbbb is the estimated RLPSI, bδ ¼ Pb-1 C C0 Pb-1 C -1 d, bθ ¼ bb0 C C0 Pb-1C -1 d d0 C0 Pb-<sup>1</sup>C -1 d is the estimated constant of pro-

portionality, bb ¼ Pb-1 Gw<sup>b</sup> is the estimated LPSI vector of coefficients, and <sup>d</sup><sup>0</sup> <sup>¼</sup> d<sup>1</sup> d<sup>2</sup> dr ½ is the vector of predetermined restrictions.

In Sect. 3.1.4 of this chapter we described how to obtain bb<sup>R</sup> ¼ Kbbb, and we also obtained matrix C<sup>0</sup> ¼ U<sup>0</sup> Gb for two and three null restrictions as C0 <sup>2</sup> ¼ U<sup>0</sup> <sup>2</sup>G<sup>b</sup> <sup>¼</sup> <sup>36</sup>:<sup>21</sup> -12:93 8:35 2:74 -12:93 13:04 -3:40 -<sup>2</sup>:<sup>24</sup> and <sup>C</sup><sup>0</sup> <sup>3</sup> ¼ U<sup>0</sup> <sup>3</sup>Gb ¼ 36:21 -12:93 8:35 2:74 -12:93 13:04 -3:04 -2:24 8:35 -3:40 9:96 0:16 2 4 3 5, whence the bb<sup>R</sup> ¼ Kbbb values for two and three null restrictions were bb<sup>0</sup> <sup>R</sup><sup>2</sup> ¼ -½ 0:164 0:162 0:680 0:856 and

bb0 <sup>R</sup><sup>3</sup> ¼ -½ 0:032 0:136 0:059 0:890 respectively.

The bθ and bδ values for two and three predetermined restrictions were bθ<sup>2</sup> ¼ bb0 C2 C0 2Pb-<sup>1</sup>C<sup>2</sup> -1 d2 d0 2 C0 2Pb-<sup>1</sup>C<sup>2</sup> -1 d2 ¼ 6:213, bθ<sup>3</sup> ¼ bb0 C3 C0 3Pb-<sup>1</sup>C<sup>3</sup> -1 d3 d0 3 C0 3Pb-<sup>1</sup>C<sup>3</sup> -1 d3 ¼ 4:529, bδ0 <sup>2</sup> ¼ Pb-1 C2 C0 2Pb-1 C2 -1 d2 <sup>0</sup> ¼ ½ 0:153 -0:052 0:083 0:024 , and bδ0 <sup>3</sup> ¼ Pb-1 C3 C0 3Pb-1 C3 -1 d3 <sup>0</sup> ¼ ½ 0:083 -0:038 0:420 0:005 . With these results, the estimated Tallis (1985) vectors of coefficients for two and three predetermined restrictions were bb<sup>0</sup> <sup>T</sup><sup>2</sup> ¼ ½ 0:793 -0:159 1:194 1:004 and bb<sup>0</sup> T3 ¼ ½ 0:342 -0:035 1:960 0:914 respectively. These latter two vectors of coefficients are the same as the vectors of coefficients obtained using the Mallard (1972) method for two and three predetermined restrictions. These results corroborate that, in effect, the Mallard (1972) and Tallis (1985) PPG-LPSIs are the same.

With the data set described in Sect. 2.8.1 of Chap. 2 we constructed Table 3.4, which presents the estimated LPSI selection response and heritability, and the estimated PPG-LPSI selection response and heritability for one, two, and three predetermined restrictions for seven simulated selection cycles using a selection intensity of 10% (kI ¼ 1.755). The averages of the estimated PPG-LPSI selection responses were 14.19, 14.00, and 12.58 for one, two, and three restrictions respectively. Note that 14.19 is also the average value for the estimated LPSI selection response. This means that the PPG-LPSI and the LPSI selection responses are the same for only one predetermined restriction. However, the estimated PPG-LPSI selection responses for two and three restrictions tend to decrease (Table 3.4). The same is true for the estimated PPG-LPSI heritability. That is, the estimated PPG-LPSI heritability for one predetermined restriction is equal to the estimated LPSI heritability. The estimated PPG-LPSI heritability for two predetermined restrictions decreased, but increased for three predetermined restrictions (Table 3.4). This is because the simulated true heritabilities of traits T1, T2, T3, and T<sup>4</sup> were 0.4, 0.6, 0.6, and 0.8 respectively.

Table 3.5 presents the estimated LPSI expected genetic gain per trait without restrictions, and the estimated PPG-LPSI expected genetic gain per trait for one, two, and three predetermined restrictions for seven simulated selection cycles using a selection intensity of 10% (kI ¼ 1.755). Once again, note that for one predetermined restriction, the estimated PPG-LPSI expected genetic gains were equal to the estimated LPSI expected genetic gains, and for two predetermined restrictions, the estimated PPG-LPSI expected genetic gains were similar to the estimated LPSI expected genetic gains; however, for three predetermined restrictions, the estimated PPG-LPSI expected genetic gains tended to decrease.


Table 3.4 Estimated LPSI selection response and heritability, and estimated predetermined proportional gain LPSI (PPG-LPSI) selection response and heritability for one, two, and three predetermined restrictions for seven simulated selection cycles

The selection intensity was 10% (kI ¼ 1.755) and the vectors of predetermined proportional gains for one, two, and three predetermined restrictions were d<sup>0</sup> <sup>1</sup> ¼ 7, d<sup>0</sup> <sup>2</sup> ¼ ½ 7 -3 and d0 <sup>3</sup> ¼ ½ 7 -3 5 respectively


Table 3.5 Estimated LPSI expected genetic gain per trait, and estimated PPG-LPSI expected genetic gain per trait for one, two, and three predetermined restrictions for seven simulated selection cycles

The selection intensity was 10% (kI ¼ 1.755) and the vectors of predetermined proportional gains for one, two, and three restrictions were d 0 ¼ 7, d<sup>0</sup> ¼ ½ 7 -3 and d<sup>0</sup> ¼ ½ 7 -3 5 respectively

The first part of Table 3.6 presents the estimated correlation of the net genetic merit (<sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g) with the estimated LPSI and RLPSI values for one, two, and three null restrictions. In addition, this first part presents the estimated LPSI versus RLPSI efficiency <sup>p</sup> <sup>¼</sup> 100(λ<sup>R</sup> - 1) (Eq. 2.21, Chap. 2). The second part of Table 3.6 presents the estimated correlation of <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g with the estimated LPSI and PPG-LPSI values for one, two, and three predetermined restrictions, and the estimated LPSI versus RLPSI efficiency <sup>p</sup> <sup>¼</sup> 100(λ<sup>P</sup> - 1). Finally, the third part of Table 3.6 presents the estimated variance of the predicted error (VPE) of the LPSI ( 1 ρ<sup>2</sup> HI σ<sup>2</sup> <sup>H</sup> ), the RLPSI ( 1 ρ<sup>2</sup> HIR σ<sup>2</sup> <sup>H</sup>), and the PPG-LPSI ( 1 ρ<sup>2</sup> HIP σ<sup>2</sup> <sup>H</sup>) for one, two, and three restrictions for seven simulated selection cycles.

The estimated VPE of the RLPSI is higher than that of the LPSI and PPG-LPSI for one, two, and three restrictions for the seven simulated selection cycles; however, the estimated VPE of PPG-LPSI is only greater than that of the LPSI for two and three predetermined restrictions.

Table 3.6 Correlation of the net genetic merit with the LPSI, the RLPSI, and the PPG-LPSI for one, two, and three null and predetermined restrictions; LPSI versus RLPSI efficiency and LPSI versus PPG-LPSI efficiency, and estimated variance of the predicted error (VPE) of the LPSI, the RLPSI, and the PPG-LPSI for one, two, and three restrictions for seven simulated selection cycles


Thus, according to the results obtained for the LPSI, the RLPSI, and the PPG-LPSI, the best predictor of the net genetic merit was the LPSI followed by the PPG-LPSI and the RLPSI.

#### 3.3 The Desired Gains Linear Phenotypic Selection Index

The most important aspect of the desired gains linear phenotypic selection index (DG-LPSI) is that it does not require economic weights. Note that the LPSI expected genetic gain per traitE ¼ kI Gb <sup>σ</sup><sup>I</sup> is maximized when b ¼ P-1 Gw and is proportional to kI and σI. Now let Gb be written as

$$\mathbf{G}\mathbf{b}=\mathbf{d},\tag{3.21}$$

where d is the vector of desired gains. From Eq. (3.21), E can be written as

$$\mathbf{E} = k\_I \frac{\mathbf{d}}{\sigma\_I}.\tag{3.22}$$

Equation (3.22) indicates that E is inversely proportional to σI; then we can minimize σ<sup>I</sup> with respect to b subject to the constraints Gb ¼ d and then E is maximized (Brascamp 1984; Itoh and Yamada 1986). That is, we need to take the derivative of the function

$$\Phi\_{DG}(\mathbf{b}, \mathbf{v}) = \mathbf{0}.\mathbf{5}(\mathbf{b}'\mathbf{P}\mathbf{b}) + \mathbf{v}'(\mathbf{G}\mathbf{b} - \mathbf{d})\tag{3.23}$$

with respect to b and v, where v is a vector of Lagrange multipliers, assuming that P, G, and d are known. The restriction Gb ¼ d in Eq. (3.23) is similar to the Tallis (1985) restriction U<sup>0</sup> Gb <sup>¼</sup> <sup>θ</sup>d, but with <sup>U</sup><sup>0</sup> <sup>¼</sup> <sup>I</sup> and <sup>θ</sup> <sup>¼</sup> 1, or <sup>θ</sup> <sup>¼</sup> <sup>k</sup><sup>I</sup> <sup>σ</sup><sup>I</sup> (Tallis 1962).

It can be shown that the vector that minimizes σ<sup>I</sup> and maximizes E can be written as

$$\mathbf{b}\_{DG} = \mathbf{P}^{-1} \mathbf{G} \left( \mathbf{G} \mathbf{P}^{-1} \mathbf{G} \right)^{-1} \mathbf{d}. \tag{3.24}$$

thus, in effect, as Gb ¼ d, bDG ¼ P-1 G(GP-1 G) -1 d ¼ P-1 G(GP-1 G) -1 Gb ¼ b. In Eq. (3.24) we are assuming that the traits in the index are the same as those in the net genetic merit. However, this may not be the case, that is, the number of traits could be different from the number of genotypes. In the latter case, Eq. (3.21) should be written as G<sup>0</sup> b ¼ d and Eq. (3.24) as bDG ¼ P-1 G(G<sup>0</sup> P-1 G) -1 d (Itoh and Yamada 1986).

According to Itoh and Yamada (1986, 1988), Eq. (3.24) does not maximize the correlation between I and H (ρIH) nor the selection response because the covariance between <sup>I</sup> and <sup>H</sup> is not defined, given that Cov(H, <sup>I</sup>) <sup>¼</sup> <sup>w</sup><sup>0</sup> Gb requires the economic weight vector w<sup>0</sup> and DG-LPSI does not use economic weights. However, note that because Gb ¼ d, the variance of the DG-LPSI is Var(IDG) ¼ d 0 (GP-1 G) -1 d ¼ b 0 Pb.

In practice, d is chosen arbitrarily and then we are in the same situation as when economic weights need to be selected. Pesek and Baker (1969), Yamada et al. (1975), and Itoh and Yamada (1986, 1988) argued that this should not be a problem for experienced breeders because they must know the relative merits and demerits of their strains. However, this may be true only for some breeders and the selection of d is always subjective. Another problem with this index is that, as it is not associated with <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>g</sup>, it is not a predictor of <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g.

#### 3.4 Applicability of the LPSI, RLPSI, and PPG-LPSI

In the context of animal breeding, Hazel (1943) pointed out that because any index is constructed from data on a herd in one locality, it may not be widely applicable. The reasons for this are:


#### References

Brascamp EW (1984) Selection indices with constraints. Anim Breed Abstr 52(9):645–654


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 4 Linear Marker and Genome-Wide Selection Indices

Abstract There are two main linear marker selection indices employed in markerassisted selection (MAS) to predict the net genetic merit and to select individual candidates as parents for the next generation: the linear marker selection index (LMSI) and the genome-wide LMSI (GW-LMSI). Both indices maximize the selection response, the expected genetic gain per trait, and the correlation with the net genetic merit; however, applying the LMSI in plant or animal breeding requires genotyping the candidates for selection; performing a linear regression of phenotypic values on the coded values of the markers such that the selected markers are statistically linked to quantitative trait loci that explain most of the variability in the regression model; constructing the marker score, and combining the marker score with phenotypic information to predict and rank the net genetic merit of the candidates for selection. On the other hand, the GW-LMSI is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit and select candidates. We describe the LMSI and GW-LMSI theory and show that both indices are direct applications of the linear phenotypic selection index theory to MAS. Using real and simulated data we validated the theory of both indices.

#### 4.1 The Linear Marker Selection Index

#### 4.1.1 Basic Conditions for Constructing the LMSI

In Chap. 2, Sect. 2.1, we indicated ten basic conditions for constructing a valid linear phenotypic selection index (LPSI). These ten conditions are also necessary for the linear marker selection index (LMSI); however, in addition to those conditions, the LMSI also requires the following conditions:


Under these conditions, the LMSI should be more efficient than the LPSI, at least in the first selection cycles (Whittaker 2003; Moreau et al. 2007).

#### 4.1.2 The LMSI Parameters

Let yi <sup>¼</sup> gi <sup>+</sup> ei be the <sup>i</sup>th trait (<sup>i</sup> <sup>¼</sup> 1, 2, ..., <sup>t</sup>, <sup>t</sup> <sup>¼</sup> number of traits), where ei~N(0, <sup>σ</sup><sup>2</sup> ei ) is the residual with expectation equal to zero and variance value σ<sup>2</sup> ei , and N stands for normal distribution. Assuming that the QTL effects combine additively both within and between loci, the ith unobservable genetic value gi can be written as

$$\mathfrak{g}\_i = \sum\_{k=1}^{N\_Q} a\_k \mathfrak{q}\_k,\tag{4.1}$$

where α<sup>k</sup> is the effect of the kth QTL, qk is the number of favorable alleles at the kth QTL (2, 1 or 0), and NQ is the number of QTL affecting the ith trait of interest.

If the QTL effect values are not observable, the gi values in Eq. (4.1) are also not observable; however, we can use a linear combination of the markers linked to the QTL (si) that affect the ith trait to predict the gi value as

$$\mathbf{x}\_{i} = \sum\_{j=1}^{M} \theta\_{j} \mathbf{x}\_{j},\tag{4.2}$$

where si is a predictor of gi, θ<sup>j</sup> is the regression coefficient of the linear regression model, xj is the coded value of the jth markers (e.g., 1, 0, and -1 for marker genotypes AA, Aa and aa respectively), and M is the number of selected markers linked to the QTL that affect the ith trait. Equation (4.2) is called the marker score (Lande and Thompson 1990; Whittaker 2003) and this is the main reason why the LMSI is not equal to the LPSI described in Chap. 2. The number of selected markers is only a subset of potential markers linked to QTL in the population under selection; thus, the si values should be lower than or equal to the gi values. One way of estimating the si values is to perform a linear regression of phenotypic values on the coded values of the markers, select markers that are statistically linked to quantitative trait loci that explain most of the variability in the regression model, and then obtain the estimated value of si (bsi) as the sum of the products of the QTL effects linked to markers and multiplied by the marker coded values associated with the ith trait. Some authors (e.g., Moreau et al. 2007) callbsi the molecular score; in this book, we call si the marker score and <sup>b</sup>si the estimated marker score.

The objective of the LMSI is to predict the net genetic merit of each individual and select the individuals with the highest net genetic merit for further breeding. In the LMSI context, the net genetic merit can be written as

$$H = \mathbf{w}'\mathbf{g} + \mathbf{w}'\_2\mathbf{s} = \begin{bmatrix} \mathbf{w}' & \mathbf{w}'\_2 \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{s} \end{bmatrix} = \mathbf{a}'\mathbf{z},\tag{4.3}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gq - is the vector of breeding values; <sup>w</sup><sup>0</sup> <sup>¼</sup> <sup>w</sup><sup>1</sup> wt ½ is the vector of economic weights associated with g; w<sup>0</sup> <sup>2</sup> ¼ 01 0<sup>t</sup> ½ is a null vector associated with the vector of marker scores s<sup>0</sup> ¼ s<sup>1</sup> st ½ ; si is the ith marker score; a<sup>0</sup> ¼ w<sup>0</sup> w<sup>0</sup> <sup>2</sup> ½ and z ¼ g<sup>0</sup> s<sup>0</sup> ½ .

The information provided by the marker score can be used in breeding programs to increase the accuracy of predicting the net genetic merit of the individuals under selection. The LMSI combines the phenotypic and marker scores to predict H in each selection cycle and can be written as

$$I\_M = \mathfrak{P}'\_{\mathfrak{y}} \mathbf{y} + \mathfrak{P}'\_{\mathfrak{s}} \mathbf{s} = \begin{bmatrix} \mathfrak{P}'\_{\mathfrak{y}} & \mathfrak{P}'\_{\mathfrak{s}} \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{s} \end{bmatrix} = \mathfrak{P}' \mathbf{t},\tag{4.4}$$

where β<sup>0</sup> <sup>y</sup> and β<sup>s</sup> are vectors of phenotypic and marker score weights respectively; <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> yt ½ is the vector of trait phenotypic values and <sup>s</sup> was defined in Eq. (4.3); β<sup>0</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> s - and t <sup>0</sup> ¼ y<sup>0</sup> s<sup>0</sup> ½ .

The LMSI selection response can be written as

$$R\_M = k\_I \sigma\_H \rho\_{I\_M H} = k\_I \sigma\_H \frac{\mathbf{a}' \mathbf{Z}\_M \mathfrak{h}}{\sqrt{\mathbf{a}' \mathbf{Z}\_M \mathbf{a}} \sqrt{\mathfrak{g}' \mathbf{T}\_M \mathfrak{g}}},\tag{4.5}$$

where kI is the standardized selection differential of the LMSI, <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi a0 <sup>Z</sup>M<sup>a</sup> <sup>p</sup> and ffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>T</sup>M<sup>β</sup> <sup>p</sup> are the standard deviations of the variances of <sup>H</sup> and IM, whereas <sup>ρ</sup>IM <sup>H</sup> and a0 ZMβ are the correlation and the covariance between H and IM respectively; T<sup>M</sup> <sup>¼</sup> Var <sup>y</sup> s <sup>¼</sup> P S S S and <sup>Z</sup><sup>M</sup> <sup>¼</sup> Var <sup>g</sup> s <sup>¼</sup> C S S S are block matrices of covariance where P ¼ Var(y), S ¼ Var(s), and C ¼ Var(g) are the covariance matrices of phenotypic values (y), the marker score (s), and the genetic value (g) respectively in the population. Vectors a and β were defined in Eqs. (4.3) and (4.4) respectively.

The LMSI expected genetic gain per trait can be written as

74 4 Linear Marker and Genome-Wide Selection Indices

$$\mathbf{E}\_M = k\_I \frac{\mathbf{Z}\_M \mathfrak{h}}{\sqrt{\mathfrak{B}' \mathbf{T}\_M \mathfrak{h}}}.\tag{4.6}$$

All the parameters in Eq. (4.6) were previously defined.

#### 4.1.3 The Maximized LMSI Parameters

Suppose that P, S and C are known matrices; then, matrices T<sup>M</sup> and Z<sup>M</sup> are known and, according to the LPSI theory (Chap. 2 for details), the LMSI vector of coefficients (βM) that maximizes <sup>ρ</sup>IM <sup>H</sup>, RM, and <sup>E</sup><sup>M</sup> can be written as

$$\mathfrak{F} = \mathbf{T}\_M^{-1} \mathbf{Z}\_M \mathbf{a},\tag{4.7}$$

whence the maximized selection response and the maximized correlation (or LMSI accuracy) between H and IM can be written as

$$R\_M = k\_I \sqrt{\mathfrak{P}' \mathbf{T}\_M \mathfrak{P}},\tag{4.8a}$$

and

$$
\rho\_{I\_M H} = \frac{\sigma\_{I\_M}}{\sigma\_H},
\tag{4.8b}
$$

respectively, where <sup>σ</sup>IM <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>T</sup>M<sup>β</sup> <sup>p</sup> is the standard deviation of the variance of IM and <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi a0 <sup>Z</sup>M<sup>a</sup> <sup>p</sup> is the deviation of the variance of <sup>H</sup>. Equations (4.8a) and (4.8b) show that the LMSI is a direct application of the LPSI theory in the marker-assisted selection (MAS) context.

Let Q ¼ T-1 <sup>M</sup> ZM; then, matrix Q can be written as

$$\mathbf{Q} = \begin{bmatrix} \left(\mathbf{P} - \mathbf{S}\right)^{-1} (\mathbf{C} - \mathbf{S}) & \mathbf{0} \\ \mathbf{I} - \left(\mathbf{P} - \mathbf{S}\right)^{-1} (\mathbf{C} - \mathbf{S}) & \mathbf{I} \end{bmatrix},\tag{4.9}$$

whence β ¼ Qa, and as w<sup>0</sup> <sup>2</sup> ¼ 01 0<sup>t</sup> ½ , we can write the two vectors of β<sup>0</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> s -as

$$\mathfrak{g}\_{\mathbf{y}} = \left(\mathbf{P} - \mathbf{S}\right)^{-1} \left(\mathbf{C} - \mathbf{S}\right) \mathbf{w} \quad \text{and} \quad \mathfrak{g}\_{\mathbf{s}} = \left[\mathbf{I} - \left(\mathbf{P} - \mathbf{S}\right)^{-1} \left(\mathbf{C} - \mathbf{S}\right)\right] \mathbf{w}. \tag{4.10a}$$

Another way of writing the marker score vector weights is

$$
\mathfrak{B}\_s = \mathbf{w} - \mathfrak{B}\_\mathbf{y},\tag{4.10b}
$$

where β<sup>y</sup> ¼ (P - S) -1 (C -S)w. By Eq. (4.10b), the optimal LMSI can be written as

$$I\_M = \mathbf{w}'\mathbf{s} + \mathfrak{P}'\_{\mathbf{y}}(\mathbf{y} - \mathbf{s}).\tag{4.11}$$

Equation (4.11) indicates that, in practice, to estimate the optimal LMSI, we only need to estimate the vector of coefficients βy. By Eq. (4.10a), Eq. (4.8a) can be written as

$$R\_M = k\_I \sqrt{\mathbf{w}' \mathbf{C} (\mathbf{P} - \mathbf{S})^{-1} (\mathbf{C} - \mathbf{S}) \mathbf{w} + \mathbf{w}' \mathbf{S} \left[ \mathbf{I} - (\mathbf{P} - \mathbf{S})^{-1} (\mathbf{C} - \mathbf{S}) \right] \mathbf{w}} \quad (4.12)$$

Thus, by Eqs. (4.10a) and (4.12), when S is a null matrix, vector β<sup>y</sup> is equal to β<sup>y</sup> ¼ P-1 Cw ¼ b and RM ¼ kI ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>¼</sup> RI, which are the LPSI vector of coefficients and its selection response respectively.

Assume that when the number of markers and genotypes tend to infinity, S tends to C; then, at the limit, we can suppose that S ¼ C, and by this latter result, RM is equal to

$$k\_I \sqrt{\mathbf{w}' \mathbf{C} \mathbf{w}}.\tag{4.13}$$

That is, Eq. (4.13) is the maximum value of the LMSI selection response when the numbers of markers and genotypes tend to infinity. Thus, the possible LMSI selection response values of Eq. (4.12) should be between kI ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> and kI ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> , i.e.,

$$k\_I \sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}} \le R\_M \le k\_I \sqrt{\mathbf{w}' \mathbf{C} \mathbf{w}},\tag{4.14}$$

or between 1 and ffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>¼</sup> <sup>σ</sup><sup>H</sup> σI , that is,

$$1 \le R\_M \le \frac{\sigma\_H}{\sigma\_I}.\tag{4.15}$$

Note that <sup>σ</sup><sup>H</sup> <sup>σ</sup><sup>I</sup> <sup>¼</sup> <sup>1</sup> ρHI , where ρHI is the maximized correlation between the net genetic merit (H) and the LPSI (I) described in Chap. 2. Equation (4.15) indicates that LMSI efficiency tends to infinity when the ρHI value tends to zero and is an additional way of denoting the paradox of LMSI efficiency described by Knapp (1998), which implies that LMSI efficiency tends to infinity when the ρHI value tends to zero.

#### 4.1.4 The LMSI for One Trait

For the one-trait case, matrices TM, ZM, and Q can be written as

$$\mathbf{T}\_M = \begin{bmatrix} \sigma\_y^2 & \sigma\_s^2 \\ \sigma\_s^2 & \sigma\_s^2 \end{bmatrix}, \quad \mathbf{Z}\_M = \begin{bmatrix} \sigma\_y^2 & \sigma\_s^2 \\ \sigma\_s^2 & \sigma\_s^2 \end{bmatrix} \quad \text{and} \quad \mathbf{Q} = \begin{bmatrix} \frac{\sigma\_y^2}{\sigma\_y^2} - \sigma\_s^2 & 0 \\ \frac{\sigma\_y^2}{\sigma\_y^2} - \sigma\_s^2 & 0 \\ \frac{\sigma\_y^2}{\sigma\_y^2} - \sigma\_s^2 & 1 \end{bmatrix},\tag{4.16}$$

where σ<sup>2</sup> <sup>y</sup> , σ<sup>2</sup> <sup>g</sup>, and σ<sup>2</sup> <sup>s</sup> are the phenotypic, genetic, and marker score variances respectively. By Eqs. (4.10a) and (4.10b), when a<sup>0</sup> ¼ ½ 1 0 , the elements of vector β ¼ Qa are

$$
\beta\_\mathbf{y} = \frac{\sigma\_\mathbf{g}^2 - \sigma\_\mathbf{s}^2}{\sigma\_\mathbf{y}^2 - \sigma\_\mathbf{s}^2} \quad \text{and} \quad \beta\_\mathbf{s} = 1 - \beta\_\mathbf{y}, \tag{4.17a}
$$

whence the optimal LMSI can be written as

$$I\_M = \mathbf{s} + \boldsymbol{\beta}\_\mathbf{y} (\mathbf{y} - \mathbf{s});\tag{4.17b}$$

whereas by Eq. (4.12), the maximized LMSI selection response can be written as

$$R\_M = k\_I \sqrt{\frac{\sigma\_g^2 \left(\sigma\_g^2 - \sigma\_s^2\right) + \sigma\_s^2 \left(\sigma\_y^2 - \sigma\_g^2\right)}{\sigma\_y^2 - \sigma\_s^2}}.\tag{4.18}$$

When σ<sup>2</sup> <sup>s</sup> <sup>¼</sup> 0, <sup>β</sup><sup>y</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> g σ2 y <sup>¼</sup> <sup>h</sup><sup>2</sup> , IM <sup>¼</sup> <sup>h</sup><sup>2</sup> y, and RM ¼ k σ2 g σy <sup>¼</sup> <sup>k</sup>σyh<sup>2</sup> <sup>¼</sup> <sup>R</sup>, the selection

response for the one-trait case without markers.

#### 4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One Trait

Suppose that the intensity of selection is the same in both indices; then, to compare LMSI versus LPSI efficiency for predicting the net genetic merit, we can use the ratio <sup>λ</sup><sup>M</sup> <sup>¼</sup> <sup>ρ</sup>IM <sup>H</sup> ρHI <sup>¼</sup> RM RI (Bulmer 1980; Moreau et al. 1998), where RI is the maximized LPSI selection response. In percentage terms, the LMSI versus LPSI efficiency can be written as

$$p\_M = 100(\lambda\_M - 1). \tag{4.19}$$

When pM <sup>¼</sup> 0, the efficiency of both indices is the same; when pM <sup>&</sup>gt; 0, the efficiency of the LMSI is higher than that of the LPSI, and when pM < 0, LPSI efficiency is higher than LMSI efficiency for predicting the net genetic merit.

In the case of one trait, Lande and Thompson (1990) showed that LMSI efficiency (not in percentage terms) with respect to phenotypic efficiency can be written as

$$
\lambda\_M = \frac{R\_M}{R} = \sqrt{\frac{q}{h^2} + \frac{\left(1 - q\right)^2}{1 - qh^2}},\tag{4.20}
$$

where RM was defined in Eq. (4.18), <sup>R</sup> <sup>¼</sup> <sup>k</sup>σyh<sup>2</sup> , <sup>h</sup><sup>2</sup> is the trait heritability, and <sup>q</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> s σ2 g is the proportion of additive genetic variance explained by the markers. According to Eq. (4.20), the advantage of the LMSI over phenotypic selection increases as the population size increases and heritability decreases, because in such cases, <sup>q</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> s σ2 g tends to 1 and Eq. (4.20) approaches <sup>1</sup> <sup>h</sup>. Therefore, the LMSI is most efficient for traits with low heritability and when the marker score explains a large proportion of the genetic variance. Thus, note that when h<sup>2</sup> tends to zero, <sup>1</sup> <sup>h</sup> tends to infinity; this means that in the asymptotic context, LMSI efficiency with respect to phenotypic efficiency for one trait (Eq. 4.20) tends to infinity and this is the LMSI paradox pointed out by Knapp (1998). There are other problems associated with the LMSI: it increases the

selection response only in the short term and can result in lower cumulative responses in the longer term than phenotypic selection, as the LMSI fixes the QTL at a faster rate than phenotypic selection. In addition, it requires the weights (Eq. 4.17a) to be updated, because in each generation the frequency of the QTL changes (Dekkers and Settar 2004).

#### 4.1.6 Statistical LMSI Properties

Assume that H and IM have bivariate joint normal distribution, β ¼ T-1 <sup>M</sup> ZMa, and that P, C, S, and w are known; then, the statistical LMSI properties are the same as the LPSI properties described in Chap. 2. That is,


Properties 1 to 4 are the same as LPSI properties 1 to 4, but, because the LMSI jointly incorporates the phenotypic and marker information to predict the net genetic merit, LMSI accuracy should be higher than LPSI accuracy. The same is true of the LMSI selection response and expected genetic gain per trait when compared with the LPSI selection response and expected genetic gain per trait.

#### 4.2 The Genome-Wide Linear Selection Index

The genome-wide linear marker selection index (GW-LMSI) is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit. In a similar manner to the LMSI, the GW-LMSI exploits the linkage disequilibrium between markers and the QTL produced when inbred lines are crossed.

#### 4.2.1 The GW-LMSI Parameters

In a similar manner to the LPSI, the main objective of the GW-LMSI is to predict the net genetic merit values of each individual and select the best individuals for further breeding. In the GW-LMSI context, the net genetic merit can be written as

$$H = \mathbf{w}'\mathbf{g} + \mathbf{w}'\_2\mathbf{m} = \begin{bmatrix} \mathbf{w}' & \mathbf{w}'\_2 \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{m} \end{bmatrix} = \mathbf{a}'\_W \mathbf{z}\_W,\tag{4.21}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gt ½ ( j ¼ 1, 2, ..., t ¼ number of traits) is the vector of breeding values, w<sup>0</sup> ¼ w<sup>1</sup> wt ½ is the vector of economic weights associated with the breeding values, and w<sup>0</sup> <sup>2</sup> ¼ ½ 01 0<sup>m</sup> is a null vector associated with the coded values of the markers m<sup>0</sup> ¼ ½ m<sup>1</sup> mm , where mj ( j ¼ 1, 2, ..., m ¼ number of markers) is the jth marker in the training population; a0 <sup>W</sup> ¼ w<sup>0</sup> w<sup>0</sup> <sup>2</sup> ½ and z<sup>W</sup> ¼ g<sup>0</sup> m<sup>0</sup> ½ .

The GW-LMSI (IW) combines the phenotypic value and the molecular information linked to the individual traits to predict H values in each selection cycle. It can be written as

$$I\_W = \mathfrak{P}'\_\mathbf{y} \mathbf{y} + \mathfrak{P}'\_m \mathbf{m} = \begin{bmatrix} \mathfrak{P}'\_\mathbf{y} & \mathfrak{P}'\_m \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{m} \end{bmatrix} = \mathfrak{P}'\_W \mathfrak{t}\_W,\tag{4.22}$$

where β<sup>0</sup> <sup>y</sup> and β<sup>m</sup> are vectors of phenotypic and marker weights respectively; <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> yt ½ is the vector of phenotypic values and <sup>m</sup> was defined in Eq. (4.21); β<sup>0</sup> <sup>W</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> m - and t 0 <sup>W</sup> ¼ y<sup>0</sup> m<sup>0</sup> ½ .

The GW-LSI selection response can be written as

$$R\_W = k\_I \sigma\_H \rho\_{I\_W H} = k\_I \sigma\_H \frac{\mathbf{a}\_W^\prime \mathbf{W} \mathbf{\hat{p}}\_W}{\sqrt{\mathbf{a}\_W^\prime \mathbf{W} \mathbf{a}\_W} \sqrt{\mathbf{\hat{p}}\_W^\prime \mathbf{Q} \mathbf{\hat{p}}\_W}},\tag{4.23a}$$

where kI is the standardized selection differential of the GW-LMSI, σ<sup>2</sup> <sup>H</sup> ¼ a<sup>0</sup> <sup>W</sup> Ψa<sup>W</sup> and Var Ið Þ¼ <sup>W</sup> β<sup>0</sup> <sup>W</sup>Φβ<sup>W</sup> are the variance of H and IW, whereas ρIW <sup>H</sup> ¼ a0 <sup>W</sup> Ψβ<sup>W</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a0 <sup>W</sup> Ψa<sup>W</sup> p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>W</sup>Φβ<sup>W</sup> <sup>p</sup> and <sup>a</sup><sup>0</sup> <sup>W</sup> Ψβ<sup>W</sup> are the correlation and the covariance between <sup>H</sup> and IW respectively; <sup>Φ</sup> <sup>¼</sup> Var <sup>y</sup> m <sup>¼</sup> P W<sup>0</sup> W M and <sup>Ψ</sup> <sup>¼</sup> Var <sup>g</sup> m <sup>¼</sup> C W<sup>0</sup> W M are block covariance matrices where <sup>P</sup> <sup>¼</sup> Var(y), M ¼ Var(m), C ¼ Var(g), and W ¼ Cov(y, m) ¼ Cov(g, m) are the covariance matrices of phenotypic values (y), the molecular marker (m) coded values, and the genetic (g) values, whereas W is the covariance matrix between y and m, and between g and m. The size of matrices P and C is t t, but the sizes of matrices M and W are m m and m t respectively.

From a theoretical point of view, Crossa and Cerón-Rojas (2011) showed that matrix M can be written as

$$\mathbf{M} = \begin{bmatrix} 1 & (1 - 2\delta\_{11}) & \cdots & (1 - 2\delta\_{1N}) \\ (1 - 2\delta\_{21}) & 1 & \cdots & (1 - 2\delta\_{2N}) \\ \vdots & \vdots & \ddots & \vdots \\ (1 - 2\delta\_{N1}) & (1 - 2\delta\_{N2}) & \cdots & 1 \end{bmatrix},\tag{4.23b}$$

where (1 - 2δij) is the covariance (or correlation) and δij the recombination frequency between the ith and jth marker (i, j ¼ 1, 2, ..., m ¼ number of markers). According to Crossa and Cerón-Rojas (2011), matrix W can be written as

$$\mathbf{W} = \begin{bmatrix} (1 - 2r\_{11})a\_{11} & (1 - 2r\_{11})a\_{12} & \cdots & (1 - 2r\_{1N})a\_{1N\_Q} \\ (1 - 2r\_{21})a\_{21} & (1 - 2r\_{22})a\_{22} & \cdots & (1 - 2r\_{2N})a\_{2N\_Q} \\ \vdots & \vdots & \ddots & \vdots \\ (1 - 2r\_{l1})a\_{l1} & (1 - 2r\_{N2})a\_{l2} & \cdots & (1 - 2r\_{NN})a\_{lN\_Q} \end{bmatrix},\tag{4.23c}$$

where (1 - 2rik)αqk (i ¼ 1, 2, ..., m, k ¼ 1, 2, ..., NQ ¼ number of QTL, q ¼ 1, 2, ..., t) is the covariance between the qth trait and the ith marker; rik is the recombination frequency between the ith marker and the kth QTL; and αqk is the effect of the kth QTL over the qth trait.

The GW-LMSI expected genetic gain per trait can be written as

$$\mathbf{E}\_{LW} = k\_I \frac{\mathbf{\Psi} \mathbf{\hat{p}}}{\sqrt{\mathbf{\hat{p}'} \mathbf{\hat{p}} \mathbf{\hat{p}}}}. \tag{4.24}$$

All parameters in Eq. (4.24) were previously defined.

Matrix Φ could be singular, i.e., its inverse (Φ-1 ) could not exist because matrix W is singular. Suppose that matrices Φ and Ψ are known; then, according to the LPSI theory, the GW-LMSI vector of coefficients (βW) that maximizes <sup>ρ</sup>IW <sup>H</sup> can be written as

$$
\mathfrak{B}\_W = \Phi^- \Psi \mathbf{a}\_W,\tag{4.25a}
$$

where matrix Φ denotes a generalized inverse of Φ. By Eq. (4.25a), the maximized GW-LMSI selection response is

$$R\_W = k\_I \sqrt{\mathfrak{P}\_W^{\prime} \mathfrak{Ap} \mathfrak{f}\_W}. \tag{4.25b}$$

Equations (4.25a) and (4.25b) show that the GW-LMSI is a direct application of the LPSI to MAS. By Eq. (4.25a), the maximized correlation between H and IW is

$$
\rho\_{I\_{WH}} = \frac{\sigma\_{I\_W}}{\sigma\_H},
\tag{4.25c}
$$

where σIW ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>W</sup>Φβ<sup>W</sup> q is the standard deviation of the variance of IW and σ<sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a0 <sup>W</sup> Ψa<sup>W</sup> p is the standard deviation of the variance of H.

#### 4.2.2 Relationship Between the GW-LMSI and the LPSI

Matrix Φcan be written as

$$\boldsymbol{\Phi}^{-} = \begin{bmatrix} \mathbf{L}^{-} & -\mathbf{L}^{-} \mathbf{W}^{\prime} \mathbf{M}^{-} \\ -\mathbf{M}^{-} \mathbf{W} \mathbf{L}^{-} & \mathbf{M}^{-} + \mathbf{M}^{-} \mathbf{W} \mathbf{L}^{-} \mathbf{W} \mathbf{M}^{-} \end{bmatrix}, \tag{4.26}$$

where L is a generalized inverse of matrix L ¼ P - W<sup>0</sup> M-W, and M is a generalized inverse of matrix M. In matrix Φ-, the inverse of matrix W is not required and the standard inverse of matrix M (M-1 ) may exist. In the latter case, the standard inverse of matrix L (L-1 ) exists and can be written as L-<sup>1</sup> <sup>¼</sup> (<sup>P</sup> - W<sup>0</sup> M-1 W) -<sup>1</sup> <sup>¼</sup> <sup>P</sup>-<sup>1</sup> + P-1 W0 [M - WP-1 W0 ] -1 WP-<sup>1</sup> (Searle et al. 2006).

By Eq. (4.26) and because w<sup>0</sup> <sup>2</sup> ¼ ½ 01 0<sup>N</sup> , the vector components of β0 <sup>W</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> m - , or <sup>β</sup><sup>W</sup> <sup>¼</sup> <sup>Φ</sup>-ΨaW, can be written as

$$\mathfrak{f}\_{\mathfrak{y}} = [\mathbf{L}^-\mathbf{C} - \mathbf{L}^-\mathbf{W}^\prime\mathbf{M}^-\mathbf{W}]\mathbf{w} \tag{4.27}$$

and

$$\mathfrak{B}\_m = [(\mathbf{M}^- + \mathbf{M}^- \mathbf{W} \mathbf{L}^- \mathbf{W} \mathbf{M}^-)\mathbf{W} - \mathbf{M}^- \mathbf{W} \mathbf{L}^- \mathbf{C}] \mathbf{w}, \tag{4.28}$$

where w is the vector of economic weights. Suppose that there is no marker information; then, matrices M and W are null and Eq. (4.27) is equal to β<sup>y</sup> ¼ P-1 Cw <sup>¼</sup> <sup>b</sup> (the LPSI vector of coefficients), whereas <sup>β</sup><sup>m</sup> <sup>¼</sup> <sup>0</sup> and RW ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>W</sup>Φβ<sup>W</sup> q ¼ kI ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>¼</sup> RI, the LPSI selection response. Now suppose that the markers explain all the genetic variability; in this case, <sup>β</sup><sup>y</sup> <sup>¼</sup> <sup>0</sup> and <sup>β</sup><sup>m</sup> <sup>¼</sup> (X<sup>0</sup> X) -X0 Y, the matrix of linear regression coefficients in the multivariate context, where (X<sup>0</sup> X) is a generalized inverse matrix of X<sup>0</sup> X and Y is a matrix of phenotypic observations.

#### 4.2.3 Statistical Properties of GW-LMSI

Assume that H and IW have bivariate joint normal distribution, β<sup>W</sup> ¼ Φ-ΨaW, and P, C, M, W, and w are known; then, the statistical GW-LMSI properties are the same as the LMSI properties. That is,


According to Lange and Whittaker (2001), GW-LMSI efficiency should be greater than LMSI efficiency. However, this would be true only if matrices P, C, M, and W are known and trait heritability is very low.

#### 4.3 Estimating the LMSI Parameters

When covariance matrices P, C, and S, and the vector of economic weights (w) are known, there is no error in the estimation of the LMSI parameters (selection response, expected genetic gain, etc.); the same is true for the GW-LMSI when, in addition to P, C, and w, the covariance matrices M and W are known. In such cases, the relative efficiency of the LMSI (GW-LMSI) depends only on the heritability of the traits and on the portion of phenotypic variation associated with markers. Using simulated data, Lange and Whittaker (2001) found that GW-LMSI efficiency was higher than LMSI efficiency when trait heritability was 0.2 and matrices P, C, M, and W were known. When P, C, S, M, and W are unknown, it is necessary to estimate them; then, the LMSI and GW-LMSI vector of coefficients and the effects associated with markers are estimated with some error. This error leads to lower LMSI and GW-LMSI efficiency than expected under the assumption that the parameters are known; however, in the latter case, Lange and Whittaker (2001) also found that GW-LMSI efficiency was greater than that of the LMSI when trait heritability was 0.05. Moreover, in the LMSI there is additional bias in the estimation of the parameters because only markers with significant effects are included in the index (Moreau et al. 1998).

In Chap. 2, we described the restricted maximum likelihood (REML) method for estimating matrices P and C. Some authors (Lande and Thompson 1990; Charcosset and Gallais 1996; Hospital et al. 1997; Moreau et al. 1998, 2007) have described methods for estimating marker scores, the variance of the marker scores, the LMSI vector of coefficients, etc., in the context of one trait; however, up to now there have been no reports on the estimation of matrix S in the multi-trait case. Lange and Whittaker (2001) only indicated that matrix <sup>S</sup> can be estimated as <sup>S</sup><sup>b</sup> <sup>¼</sup> Var bs , where <sup>b</sup><sup>s</sup> is a vector of estimated marker scores associated with several individual traits.

The main problems associated with the estimated LMSI parameters are:


When the first point is true, the estimated LMSI selection response and efficiency could be negative because the estimated matrix <sup>T</sup>b<sup>M</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>S</sup><sup>b</sup> Sb Sb is not positive definite (all eigenvalues positive) and the estimated matrix <sup>Z</sup>b<sup>M</sup> <sup>¼</sup> <sup>G</sup><sup>b</sup> <sup>S</sup><sup>b</sup> Sb Sb is not positive semi-definite (no negative eigenvalues). In addition, the results can lead to all weights being placed on the molecular score and the weights on the phenotype values can be negative (Moreau et al. 2007). When the second point is true, the variance of the marker scores is not useful. The two problems indicated above could be caused by using the same data set to select markers and to estimate marker effects, and there is no simple way of solving them. Lande and Thompson (1990) proposed that the markers used to obtain Sb be selected a priori as those with the most highly significant partial regression coefficients from among all the markers in the linkage group analyzed in the previous generation. Zhang and Smith (1992, 1993) proposed using two independent sets of markers: one to estimate marker effects and the other to select markers. Additional solutions to these problems were described by Moreau et al. (2007).

In this subsection, we describe methods (in the univariate and multivariate context) for estimating molecular marker effects, marker scores, and their variance and covariance, and for estimating the LMSI and GW-LMSI vector of coefficients, selection response, expected genetic gain, and accuracy. This subsection is only for illustration; we use the same data set to select markers, and to estimate marker effects and the variance of marker scores.

#### 4.3.1 Estimating the Marker Score

According to Eqs. (4.11) and (4.17b), when the vector of economic weights is equal to a<sup>0</sup> ¼ ½ 1 0 , the LMSI for the ith trait yi (i ¼ 1, 2, , t; t ¼ number of traits) value can be written as IMli ¼ si þ βyi yð Þ <sup>i</sup> si (l ¼ 1, 2, , n; n ¼ number of individuals or genotypes), where <sup>β</sup>yi <sup>¼</sup> <sup>σ</sup><sup>2</sup> gi σ<sup>2</sup> si σ2 yi σ<sup>2</sup> si ¼ h2 <sup>i</sup> 1 qi ð Þ 1 qi h2 i is the LMSI coefficient, h<sup>2</sup> <sup>i</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> gi σ2 yi is the heritability of the <sup>i</sup>th trait, and qi <sup>¼</sup> <sup>σ</sup><sup>2</sup> si σ2 gi is the proportion of

genetic variance explained by the QTL or markers associated with the ith trait; si

$$\mathbf{x} = \sum\_{j=1}^{M} \theta\_j \mathbf{x}\_j \text{ ( $j = 1, 2, \dots$ ,  $M$ ;  $M = $ number of selected markers) is the  $i$ th individual trial molar cover; and  $\sigma^2$  } \quad \sigma^2 \quad \text{and} \quad \sigma^2 \text{ are the  $i$ th variance of the}$$

individual trait marker score; and σ<sup>2</sup> yi , σ<sup>2</sup> gi , and σ<sup>2</sup> si are the ith variances of the phenotypic, genetic, and marker score values respectively.

The simplest way of estimating the ith marker score si is to perform a multiple linear regression of phenotypic values (yi) on the coded values of the markers (xj) and then select the markers statistically linked to the ith QTL that explain most of the variability in the regression model and use them to construct si <sup>¼</sup> <sup>P</sup> θ jx <sup>j</sup>.

j2M We can fit the model y<sup>∗</sup> <sup>i</sup> <sup>¼</sup> <sup>X</sup> j2M <sup>θ</sup> jx <sup>j</sup> <sup>þ</sup> <sup>e</sup>, where <sup>y</sup><sup>∗</sup> <sup>i</sup> ¼ yi y<sup>i</sup> and y<sup>i</sup> are the average

values of the ith trait, by maximum likelihood or least squares. When estimating θj, the main problem is to choose the set of markers M based on criteria for declaring markers as significant and then use the estimated values of <sup>θ</sup><sup>j</sup> (bθj) to estimate the <sup>i</sup>th marker score si asbsi <sup>¼</sup> <sup>X</sup> j2M <sup>b</sup>θjx <sup>j</sup>. The values ofbsi may increase or decrease according

to the number of markers (xj) included in the model, and <sup>b</sup>si affects LMSI selection response and efficiency by means of the estimated variance of <sup>b</sup>si (σb<sup>2</sup> bsi ) (Figs. 4.1 and 4.2).

According to the least squares method of estimation, bθ ¼ X<sup>0</sup> ð Þ X -1 X0 y<sup>∗</sup> is an estimator of the vector of regression coefficients <sup>θ</sup><sup>0</sup> <sup>¼</sup> ½ <sup>θ</sup><sup>1</sup> <sup>θ</sup><sup>2</sup> <sup>θ</sup><sup>m</sup> , where m (m < n) is the number of markers, X is a matrix n m of coded marker values (e.g., 1, 0 and -1 for marker genotypes AA, Aa, and aa respectively) and y<sup>∗</sup> is a vector n 1 of phenotypic values centered based on its average values. Only a subset M(M < m) of the m markers is statistically linked to the QTL and then only a M

subset <sup>M</sup> of the estimated vector <sup>b</sup><sup>θ</sup> values is selected to estimate si as <sup>b</sup>si <sup>¼</sup> <sup>X</sup> j¼1 bθjx <sup>j</sup>.

$$\text{To illustrate how to obtain } \widehat{s}\_i = \sum\_{j \in M} \widehat{\theta}\_j \mathbf{x}\_j \text{, we use a real maxima } (\mathbf{Z}ea \text{ may}) \text{ F}\_2.$$

population with 247 genotypes (each one with two repetitions), 195 molecular markers, and four traits – grain yield (GY, ton ha-1 ); plant height (PHT, cm), ear height (EHT, cm), and anthesis day (AD, days) – evaluated in one environment. In an F2 population, the marker homozygous loci for the allele from the first parental line can be coded by 1, whereas the marker homozygous loci for the allele from the second parental line can be coded by -1, and the marker heterozygous loci by 0.

Fig. 4.1 Efficiency of the linear molecular selection index with respect to phenotypic selection for the one-trait case for different values of the variance of the marker score when the phenotypic and genetic variances are fixed

Fig. 4.2 Selection response values of the linear molecular selection index for the one-trait case for different values of the variance of the marker score when the phenotypic and genetic variances are fixed

For this example, we used trait PHT. Only seven markers were statistically linked to the PHT. The estimated vector of regression coefficients for these seven markers was bθ<sup>0</sup> ¼ ½ 5:46 -4:54 0:98 7:39 -7:75 -1:91 -3:53 . Table 4.1 presents the first 20 genotypes, the coded values of the seven selected markers, and the first 20 estimated <sup>b</sup>sPHT values of the 247 genotypes in the maize (Zea mays) F2


Table 4.1 Number of selected genotypes, coded values of seven selected markers, and estimated marker score values obtained from a maize (Zea mays) F2 population with 247 genotypes and 195 molecular markers

population. According to bθ<sup>0</sup> and the coded values of the seven markers, the first estimated <sup>b</sup>sPHT value was obtained as <sup>b</sup>sPHT<sup>1</sup> ¼ -1:91 1ð Þþ-3:53ð Þ¼ -1 1:62 ; the second estimated <sup>b</sup>sPHT value was obtained as <sup>b</sup>sPHT<sup>2</sup> <sup>¼</sup> <sup>5</sup>:46ð Þþ -1 -4:54ð Þ- -1 1:91ð Þ¼ -<sup>1</sup> <sup>0</sup>:99, etc. The 20th estimated <sup>b</sup>sPHT value was obtained as <sup>b</sup>sPHT<sup>20</sup> ¼ -3:53ð Þ¼ -1 3:53. This estimation procedure is valid for any number of genotypes and markers.

Figure 4.3 shows the distribution of the 247 estimated marker scores associated with traits PHT and EHT of the maize F2 population. Note that the estimated marker score values approach normal distribution.

#### 4.3.2 Estimating the Variance of the Marker Score

There are many methods of estimating the variance of the marker score associated with the ith trait (σ<sup>2</sup> si ); the first one was proposed by Lande and Thompson (1990). According to these authors, σ<sup>2</sup> si can be estimated as

Fig. 4.3 Distribution of the marker scores associated with traits (a) plant height and (b) ear height of a maize (Zea mays) F2 population. Note that the distribution of frequencies of the marker score values approaches normal distribution

$$
\widehat{\sigma}\_{\widehat{\gamma}\_i}^2 = \widehat{\mathbf{\hat}}\_i^\prime \mathbf{M}\_i \widehat{\mathbf{\hat}}\_i - \frac{M \widehat{\sigma}\_{e\_i}^2}{n},
\tag{4.29}
$$

where <sup>b</sup>θ<sup>i</sup> is the estimated vector of regression coefficients of the selected markers, <sup>M</sup><sup>i</sup> <sup>¼</sup> <sup>2</sup> n X0 i X<sup>i</sup> is the covariance matrix M M of the selected markers that are statistically linked to the <sup>i</sup>th trait marker loci; <sup>σ</sup>b<sup>2</sup> ei <sup>¼</sup> <sup>y</sup><sup>0</sup> ð Þ I - H y n - M - 1 is the unbiased estimated variance of the residuals, H ¼ I - X<sup>i</sup> X<sup>0</sup> i Xi -1 X0 i , I is an identity matrix n n, M is the number of selected markers statistically linked to the QTL, and X<sup>i</sup> is a matrix n M with the coded values of the selected markers. According to Lande and Thompson (1990), Eq. (4.29) is an unbiased estimator of σ<sup>2</sup> si and its variance can be written as

$$\text{Var}\left(\widehat{\sigma}\_{\widehat{s}\_{i}}^{2}\right) = \frac{4\sigma\_{s\_{i}}^{2}\sigma\_{e\_{i}}^{2}}{n} + \frac{2M\left(\sigma\_{e\_{i}}^{2}\right)^{2}}{n^{2}} + \frac{2M^{2}\left(\sigma\_{e\_{i}}^{2}\right)^{2}}{n^{2}(n-M)},\tag{4.30}$$

which tends to zero when n, the number of genotypes or individuals, is very high.

From Eq. (4.29), it is possible to obtain an estimator of the covariance between the ith and jth marker scores when the number of selected markers statistically linked to the QTL is the same in the ith and jth traits. Thus, by Eq. (4.29), the covariance between the ith and jth marker scores can be estimated as

$$
\widehat{\sigma}\_{\widehat{\gamma}\_{\widehat{\psi}}} = \widehat{\mathbf{\hat{\theta}}}\_i' \mathbf{M}\_{\widehat{\psi}} \widehat{\mathbf{\hat{\theta}}}\_{\widehat{\mathbf{\hat{\theta}}}} - \frac{M \widehat{\sigma}\_{\varepsilon\_{\widehat{\psi}}}}{n},
\tag{4.31}
$$

where <sup>b</sup>θ<sup>i</sup> and <sup>b</sup>θ<sup>j</sup> are the estimated vectors of regression coefficients of the selected markers associated with the <sup>i</sup>th and <sup>j</sup>th trait loci respectively; <sup>M</sup>ij <sup>¼</sup> <sup>2</sup> n X0 i X<sup>j</sup> is the covariance matrix M M of the markers statistically linked to the ith and jth trait marker loci; X<sup>i</sup> and X<sup>j</sup> are n M matrices with the coded values of the selected markers associated with the <sup>i</sup>th and <sup>j</sup>th trait loci respectively; <sup>σ</sup>beij <sup>¼</sup> <sup>y</sup><sup>0</sup> <sup>i</sup> I - Hij y j n - M - 1 is the estimated covariance of the residuals between the ith (yi) and jth (yj) trait values, Hij ¼ I - X<sup>i</sup> X<sup>0</sup> i Xj -1 X0 j , I is an identity matrix n n, and M is the number of selected markers statistically linked to the QTL.

According to the PHT values described in Sect. 4.3.1 of this chapter, M ¼ 7, <sup>n</sup> <sup>¼</sup> 247, <sup>σ</sup>b<sup>2</sup> ei <sup>¼</sup> <sup>180</sup>:80 and <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>48</sup>:23 (Eq. 4.29). Note that <sup>σ</sup>b<sup>2</sup> bsPHT <sup>σ</sup>b<sup>2</sup> gPHT , where σb2 gPHT ¼ 83:0 is an estimate of the genetic variance of PHT. The estimated portion of the genetic variance attributable to <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>48</sup>:23 was <sup>b</sup>qPHT <sup>¼</sup> <sup>48</sup>:<sup>23</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:5811; that is, the seven markers explain 58.11% of the genetic variance associated with PHT.

Charcosset and Gallais (1996) considered two possible methods of estimating σ2 si based on the coefficient of multiple determination or squared multiple correlation R<sup>2</sup> (note that in this case R<sup>2</sup> is not the square of the selection response). The coefficient R<sup>2</sup> gives the portion of the total variation in the phenotypic values that is "explained" by, or attributable to, the markers and can be written as

$$R^2 = \frac{\hat{\mathbf{o}}\mathbf{X}^\prime\mathbf{y} - n\bar{\mathbf{y}}^2}{\mathbf{y}^\prime\mathbf{y} - n\bar{\mathbf{y}}^2} = \frac{\hat{\sigma}\_s^2}{\hat{\sigma}\_\mathbf{y}^2},\tag{4.32a}$$

where bθX<sup>0</sup> y ny- <sup>2</sup> is the overall regression sum of squares adjusted for the intercept and y<sup>0</sup> y ny- <sup>2</sup> is the total sum of squares adjusted for the mean. The coefficient R<sup>2</sup> is equal to 1 if the fitted equation yi <sup>¼</sup> <sup>θ</sup><sup>0</sup> <sup>þ</sup> <sup>P</sup> j2M θ jx <sup>j</sup> þ ei passes through all the data points, so that all residuals are null; then, the markers explain all the phenotypic variance. At the other extreme, R<sup>2</sup> is zero if y<sup>i</sup> ¼ bθ<sup>0</sup> and the estimated regression coefficients are null, i.e., <sup>b</sup>θ<sup>1</sup> <sup>¼</sup> <sup>b</sup>θ<sup>2</sup> ¼¼ <sup>b</sup>θ<sup>M</sup> <sup>¼</sup> 0. In the latter case, markers do not affect the phenotypic observations and the variance of the marker score values is zero. Thus, the <sup>R</sup><sup>2</sup> values are between 0 and 1, i.e., 0 <sup>R</sup><sup>2</sup> 1.0. Equation (4.32a) is useful for estimating σ<sup>2</sup> si as <sup>σ</sup>b<sup>2</sup> yi X M j¼1 R2 <sup>j</sup> <sup>¼</sup> <sup>σ</sup>b<sup>2</sup> <sup>s</sup>, where R<sup>2</sup> <sup>j</sup> is the estimated value of the jth marker and <sup>σ</sup>b<sup>2</sup> <sup>y</sup> is the phenotypic variance of the ith trait; however, this is a biased estimator of σ<sup>2</sup> si (Hospital et al. 1997).

Charcosset and Gallais (1996) and Hospital et al. (1997) proposed an unbiased estimator of σ<sup>2</sup> si based on all the selected markers using the adjusted coefficient of multiple determination, i.e.,

$$R\_{Adj}^2 = 1 - \frac{n-1}{n-M-1} \left(1 - R^2\right) = \frac{\widehat{\sigma}\_x^2}{\widehat{\sigma}\_y^2},\tag{4.32b}$$

whence we can obtain a unbiased estimator of σ<sup>2</sup> si as <sup>σ</sup>b<sup>2</sup> yR2 Adj <sup>¼</sup> <sup>σ</sup>b<sup>2</sup> bsi by jointly using all the markers that affect the phenotypic values. The problem with Eq. (4.32b) is that theR<sup>2</sup> Adj values could be negative; in that case, the estimated value of σ<sup>2</sup> si would also be negative. One additional problem with Eq. (4.32b) is that theR<sup>2</sup> Adj values can produce σb2 <sup>s</sup> values that are higher than those of the estimated variance of the breeding values σb2 g.

Using Eqs. (4.32a) and (4.32b), we can estimate σ<sup>2</sup> si , but from them it is not clear how we can estimate the covariance between two different estimated marker score values.

Consider the case of the PHT values described in Sect. 4.3.1 of this chapter, where <sup>M</sup> <sup>¼</sup> 7, <sup>n</sup> <sup>¼</sup> 247, and the estimated variance of PHT was <sup>σ</sup>b<sup>2</sup> PHT ¼ 191:81. The estimated values of R<sup>2</sup> for each of the seven markers were 0.0038, 0.0005, 0.006, 0.0013, 0.0036, 0.0114, and 0.0298, whence, by multiplying each estimated R<sup>2</sup> value by <sup>σ</sup>b<sup>2</sup> PHT ¼ 191:81 and summing the results, we found that the estimated value of σ<sup>2</sup> sPHT was <sup>σ</sup>b<sup>2</sup> bsPHT ¼ 9:78. In this case, the estimated portion of the genetic variance attributable to <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>9</sup>:78 was <sup>b</sup>qPHT <sup>¼</sup> <sup>9</sup>:<sup>78</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1178; thus, when we estimated σ<sup>2</sup> sPHT according to Eq. (4.32a), the seven markers explained only 11.78% of the genetic variance associated with PHT.

The estimated value of R<sup>2</sup> Adj for the seven markers jointly was 0.06, whence <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> ð Þ <sup>191</sup>:<sup>81</sup> ð Þ¼ <sup>0</sup>:<sup>06</sup> <sup>11</sup>:50 is an estimate of <sup>σ</sup><sup>2</sup> sPHT . In the latter case, the estimated portion of the genetic variance attributable to <sup>σ</sup>b<sup>2</sup> sPHT ¼ 11:50 was <sup>b</sup>qPHT <sup>¼</sup> <sup>11</sup>:<sup>5</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1385; that is, according to Eq. (4.32b), the seven markers explain 13.85% of the genetic variance associated with PHT.

One additional way of estimating the variance of the marker score σ<sup>2</sup> si was proposed by Lange and Whittaker (2001) as

$$\frac{1}{n-1}\sum\_{i=1}^{n}\left(\widehat{s}\_{i}-\widehat{\mu}\_{s\_{i}}\right)^{2},\tag{4.33}$$

where <sup>b</sup>si <sup>¼</sup> <sup>X</sup> M j¼1 <sup>b</sup>θjx <sup>j</sup> and <sup>μ</sup>bsi is the mean of <sup>b</sup>si values. The covariance between the <sup>i</sup>th

and jth marker scores can be estimated as the cross products of the marker score values divided by n - 1. Note that in this case, the number of markers associated with the ith and jth traits may be different.

For the PHT values described in Sect. 4.3.1 of this chapter, where n ¼ 247, the estimated value of σ<sup>2</sup> si was <sup>σ</sup>b<sup>2</sup> sPHT ¼ 15:75 and the estimated portion of the genetic variance attributable to <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> <sup>15</sup>:75 was <sup>b</sup>qPHT <sup>¼</sup> <sup>15</sup>:<sup>75</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1897. That is, the seven markers jointly explain 18.97% of the genetic variance associated with PHT according to Eq. (4.33).

#### 4.3.3 Estimating LMSI Selection Response and Efficiency

With the estimated phenotypic variances (σb<sup>2</sup> PHT ¼ 191:81 ), the estimated genetic variance (σb<sup>2</sup> gPHT <sup>¼</sup> <sup>83</sup>:0 ) and the estimated marker score variances: <sup>σ</sup>b<sup>2</sup> bsPHT ¼ 48:23 (Eq. 4.29), <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>9</sup>:78 (Eq. 4.32a), <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> <sup>11</sup>:50 (Eq. 4.32b), and <sup>σ</sup>b<sup>2</sup> sPHT ¼ 15:75 (Eq. 4.33), we can estimate the LMSI coefficient, selection response, and efficiency.

Using the estimated value <sup>σ</sup>b<sup>2</sup> bsPHT ¼ 48:23 obtained with Eq. (4.29), it is possible to estimate the LMSI weight as β <sup>b</sup>PHT <sup>¼</sup> <sup>σ</sup>b<sup>2</sup> gPHT <sup>σ</sup>b<sup>2</sup> sPHT σb2 PHT <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> <sup>83</sup>:<sup>0</sup> - 48:23 191:81 - <sup>48</sup>:<sup>23</sup> <sup>¼</sup> <sup>0</sup>:242, whereas for <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>9</sup>:78, <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> <sup>11</sup>:50, and <sup>σ</sup>b<sup>2</sup> sPHT ¼ 15:75, the estimated values of βPHT were 0.402, 0.40, and 0.382 respectively. The latter results indicate that the estimated values of βPHT associated with the phenotypic values tend to decrease when the estimated values of the variance of the marker score increase. This means that at the limit, when all the genetic variance is explained by the markers, the estimated values of <sup>β</sup>PHT are zero and the estimated LMSI is equal to <sup>b</sup><sup>I</sup> <sup>M</sup> <sup>¼</sup> <sup>b</sup>s. Thus, for trait PHT, when the estimated values of βPHT are not zero, the estimated LMSI can be written as <sup>b</sup><sup>I</sup> MPHT <sup>¼</sup> <sup>b</sup>sPHT <sup>þ</sup> <sup>β</sup> <sup>b</sup>PHT PHTi <sup>b</sup>sPHT . The bI MPHT values are used to predict, rank, and select the net genetic merit value of each individual candidate for selection.

Based on the result <sup>σ</sup>b<sup>2</sup> bsPHT ¼ 48:23 obtained with Eq. (4.29) and using a selection intensity of 10% (kI¼ 1.755), the estimated LMSI selection response can be obtained as

$$\begin{split} \widehat{R}\_{M} &= k\_{I} \sqrt{\frac{\widehat{\sigma}\_{\text{g}}^{2} \left( \widehat{\sigma}\_{\text{g}}^{2} - \widehat{\sigma}\_{\text{s}}^{2} \right) + \widehat{\sigma}\_{\text{s}}^{2} \left( \widehat{\sigma}\_{\text{y}}^{2} - \widehat{\sigma}\_{\text{g}}^{2} \right)}{\widehat{\sigma}\_{\text{y}}^{2} - \widehat{\sigma}\_{\text{s}}^{2}}} \\ &= 1.755 \sqrt{\frac{83(83 - 48.23) + 48.23(191.81 - 83)}{191.81 - 48.23}} \\ &= 1.755 \sqrt{56.65} = 13.21. \end{split}$$

In a similar manner, using the result <sup>σ</sup>b<sup>2</sup> sPHT ¼ 15:75, the estimated selection response wasRb<sup>M</sup> ¼ 1:755 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 83 83 ð Þþ - 15:75 15:75 191 ð Þ :81 - 83 191:81 - <sup>15</sup>:<sup>75</sup> <sup>r</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> ffiffiffiffiffiffiffiffiffiffiffi <sup>41</sup>:<sup>44</sup> <sup>p</sup> <sup>¼</sup> <sup>11</sup>:30: With <sup>σ</sup>b<sup>2</sup> bsPHT <sup>¼</sup> <sup>9</sup>:78 and <sup>σ</sup>b<sup>2</sup> sPHT ¼ 11:50, the estimated values of the LMSI selection responses were 10.99 and 11.10 respectively. The latter results indicate that the estimated values of the LMSI selection responses tend to increase when the estimated values of the variance of the marker score increase.

We can estimate LMSI versus phenotypic efficiency for one trait as bλ<sup>M</sup> ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bq bh2 þ 1 bq 2 1 <sup>b</sup>qbh<sup>2</sup> vuut , where <sup>b</sup>h<sup>2</sup> is the estimated trait heritability and <sup>b</sup><sup>q</sup> <sup>¼</sup> <sup>σ</sup>b<sup>2</sup> s σb2 g is the estimated portion of additive genetic variance explained by the markers. When σb2 bsPHT <sup>¼</sup> <sup>48</sup>:23, <sup>b</sup>qPHT <sup>¼</sup> <sup>48</sup>:<sup>23</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:5811, and <sup>b</sup>h<sup>2</sup> <sup>¼</sup> <sup>0</sup>:433, the estimated LMSI efficiency was <sup>b</sup>λ<sup>M</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffi <sup>1</sup>:<sup>58</sup> <sup>p</sup> <sup>¼</sup> <sup>1</sup>:25. For <sup>σ</sup>b<sup>2</sup> sPHT <sup>¼</sup> <sup>15</sup>:75, <sup>σ</sup>b<sup>2</sup> bsPHT ¼ 9:78, and σb2 sPHT ¼ 11:50, the estimated portions of the additive genetic variance explained by the markers were <sup>b</sup>qPHT <sup>¼</sup> <sup>15</sup>:<sup>75</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1897, <sup>b</sup>qPHT <sup>¼</sup> <sup>9</sup>:<sup>78</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1178, and <sup>b</sup>qPHT <sup>¼</sup> <sup>11</sup>:<sup>5</sup> <sup>83</sup> <sup>¼</sup> <sup>0</sup>:1385 respectively, whence the estimated LMSI efficiencies were 1.1, 1.04, and 1.05 respectively. The latter results indicate that the estimated values of LMSI efficiency tend to increase when the estimated values of the variance of the marker score increase (Fig. 4.1).

Figure 4.1 presents the change in LMSI efficiency with respect to phenotypic selection for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In a similar manner, Fig. 4.2 presents the change in the LMSI selection response for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In effect, LMSI efficiency and the selection response depend on the genetic variance explained by the markers.

#### 4.3.4 Estimating the Variance of the Marker Score in the Multi-Trait Case

Equation (4.33) can be used in the multi-trait context when the numbers of markers associated with the ith and jth traits are different. Also, it is possible to adapt Eqs. (4.32a) and (4.32b) to the multi-trait case. However, in the latter case, in addition to the markers linked to the QTL that affect one specific trait, we need to find markers that affect more than one trait, which may be very difficult. For this reason, in the multi-trait context, Eqs. (4.32a) and (4.32b) could be used to estimate the variance of the marker score (S) without preselecting the markers that affect the phenotypic traits, only when the number of genotypes is higher than the number of markers.

Let y1, y2, ..., y<sup>r</sup> be r independent multivariate normal vectors of observations,

$$\text{Each with } n \text{ observations, such that } \mathbf{Y} = \begin{bmatrix} \mathbf{y}\_{11} & \mathbf{y}\_{12} & \cdots & \mathbf{y}\_{1t} \\ \mathbf{y}\_{21} & \mathbf{y}\_{22} & \cdots & \mathbf{y}\_{2t} \\ \vdots & \vdots & \cdots & \vdots \\ \mathbf{y}\_{n1} & \mathbf{y}\_{n2} & \cdots & \mathbf{y}\_{nt} \end{bmatrix} \text{ is a matrix } n \times t \text{ of }$$

observations for t traits; then, the multivariate linear regression model can be written as Y ¼ XB + U, where X is a matrix n m (m¼ number of markers and m < n) of known coded marker values, <sup>B</sup> is a matrix <sup>m</sup> <sup>n</sup> of regression coefficients, and <sup>U</sup> is a matrix n t of unobserved random disturbance whose rows for given X are uncorrelated, each with mean 0 and common covariance matrix E (Mardia et al. 1982; Rencher 2002). According to the least squares method of estimation, Bb ¼ X<sup>0</sup> ð Þ X -1 X0 Y is an estimator of B and Eb ¼ Y - BX<sup>b</sup> 0 Y - BX<sup>b</sup> n m - 1 is an estimator of the residual covariance matrix E assuming that n > m (Johnson and Wichern 2007). Note that 1 - <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>e<sup>0</sup> be y0 y , where <sup>b</sup><sup>e</sup> is a vector of estimated residual values of the model yi <sup>¼</sup> <sup>θ</sup><sup>0</sup> <sup>þ</sup> <sup>P</sup> j2M <sup>θ</sup> jx <sup>j</sup> <sup>þ</sup> ei and <sup>R</sup><sup>2</sup> is the coefficient of multiple determination (Eq. 4.32a). In addition, as in the multi-trait context the estimated matrix of residuals is Ub ¼ Y - BXb , 1 - <sup>R</sup><sup>2</sup> can be written as <sup>D</sup> <sup>¼</sup> <sup>Y</sup><sup>0</sup> ð Þ <sup>Y</sup> -1 Ub0 Ub (Mardia et al. 1982), whence R<sup>2</sup> in the multivariate context can written as

$$\mathbf{R}^2 = \mathbf{I} - \mathbf{D} = \widehat{\mathbf{P}}^{-1}\widehat{\mathbf{S}},\tag{4.34a}$$

whereas R<sup>2</sup> Adj (Eq. 4.32b) can be written as

$$\mathbf{R}\_{Adj}^2 = \mathbf{I} - \frac{n-1}{n-m-1} \mathbf{D} = \widehat{\mathbf{P}}^{-1} \widehat{\mathbf{S}},\tag{4.34b}$$

where I is an identity matrix t t, Pb-<sup>1</sup> is the inverse of the estimated covariance matrix of phenotypic values (Pb), and Sb is the estimated covariance matrix of marker score values. From Eq. (4.34b),

$$
\widehat{\mathbf{P}} \mathbf{R}\_{\text{Adj}}^2 = \widehat{\mathbf{S}} \tag{4.34c}
$$

is an unbiased estimator of matrix <sup>S</sup>b, whereas PR<sup>b</sup> <sup>2</sup> <sup>¼</sup> <sup>S</sup><sup>b</sup> (Eq. 4.34a) is a biased estimator of matrix Sb. The main problem of Eq. (4.34c) is that the diagonal elements of Sb could be negative.

From the maize F2 population including 247 genotypes (each one with two repetitions) and 195 molecular markers described in Sect. 4.3.1, we used two traits—PHT (cm) and EHT (cm)—to illustrate the multivariate method of estimating the LMSI parameters. The estimated phenotypic and genetic covariance matrices were <sup>P</sup><sup>b</sup> <sup>¼</sup> <sup>191</sup>:81 106:<sup>89</sup> <sup>106</sup>:89 167:<sup>93</sup> and <sup>C</sup><sup>b</sup> <sup>¼</sup> <sup>83</sup>:00 57:<sup>44</sup> <sup>57</sup>:44 59:<sup>80</sup> , whereas the estimated covariance matrix of marker scores, using Eq. (4.33), was <sup>S</sup><sup>b</sup> <sup>¼</sup> <sup>15</sup>:750 0:<sup>983</sup> <sup>0</sup>:983 28:<sup>083</sup> . When we used Eq. (4.34a) and Eq. (4.34c), we obtained estimated values of the variance and covariance of the marker scores that were higher than the genetic values (data not presented). Equations (4.29) and (4.31) are used later to compare LMSI efficiency versus GW-LMSI efficiency using the simulated data described in Chap. 2, Sect. 2.8.1. With matrices Pb, Cb, and Sb, and the vector of economic weights a<sup>0</sup> ¼ w<sup>0</sup> 0<sup>0</sup> ½ , where w<sup>0</sup> ¼ -½ 1 -1 and 0<sup>0</sup> ¼ ½ 0 0 , we obtained the estimated matrices Tb <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>S</sup><sup>b</sup> Sb Sb and <sup>Z</sup> <sup>¼</sup> <sup>C</sup><sup>b</sup> <sup>S</sup><sup>b</sup> Sb Sb , whence the estimated LMSI vector of coefficients was βb<sup>0</sup> ¼ a<sup>0</sup> ZbMTb-1 <sup>M</sup> ¼ -½ 0:59 -0:18 -0:41 -0:82 . Using a selection inten-

sity of 10% (kI ¼ 1.755), the estimated LMSI selection response and the expected genetic gains per trait were Rb<sup>M</sup> ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 TbMβb q ¼ 20:41 and Eb<sup>0</sup> <sup>M</sup> ¼ kI bβ0 <sup>Z</sup>b<sup>M</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 TbMβb q ¼

½ -10:09 -10:31 -2:53 -4:39 respectively, whereas the estimated LMSI accuracy was <sup>b</sup>ρHI ^<sup>M</sup> <sup>¼</sup> <sup>σ</sup>bIM σbH ¼ 0:72.

The estimated LPSI parameters (see Chap. 2 for details) using the phenotypic information from the maize F2 population for traits PHT and EHT are as follows. The estimated LPSI vector of coefficients was <sup>b</sup>b<sup>0</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> CbPb-<sup>1</sup> ¼ -½ 0:53 -0:36 , and, with a selection intensity of 10% (kI ¼ 1.755), the estimated LPSI selection response and the expected genetic gains per trait were Rb<sup>I</sup> ¼ kI ffiffiffiffiffiffiffiffiffiffi bb0 Pbbb q ¼ 18:97 and b E<sup>0</sup> ¼ kI bb0 Cb σbI ¼ -½ 10:52 -8:45 respectively, whereas the estimated LPSI accuracy was <sup>b</sup>ρHI ^ <sup>¼</sup> <sup>σ</sup>b<sup>I</sup> σbH ¼ 0:67.

We can determine LMSI efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of estimated accuracy values <sup>b</sup>ρHI ^<sup>M</sup> <sup>¼</sup> <sup>0</sup>:72 and <sup>b</sup>ρHI ^ <sup>¼</sup> <sup>0</sup>:67 of the LMSI and LPSI respectively, i.e., <sup>b</sup>λ<sup>M</sup> <sup>¼</sup> <sup>0</sup>:<sup>72</sup> <sup>0</sup>:<sup>67</sup> <sup>¼</sup> <sup>1</sup>:075, whence, according to Eq. (4.19), the estimated LMSI efficiency versus the LPSI efficiency, in percentage terms, was <sup>b</sup>pM <sup>¼</sup> 100 1ð Þ¼ :<sup>075</sup> - 1 7:5. That is, for these data, the estimated LMSI efficiency was only 7.5% greater than LPSI efficiency at predicting the net genetic merit.

#### 4.4 Estimating the GW-LMSI Parameters in the Asymptotic Context

Lange and Whittaker (2001) proposed the GW-LMSI. However, these authors did not provide detailed procedures for estimating matrices P, C, W, and M. They indicated that matrix C can be estimated using the estimated matrix of covariance of marker scores (Sb) and that matrices P, W, and M can be estimated directly by their empirical variances and covariances, but this assertion does not indicate a clear method for estimating those covariance matrices. In Chap. 2, we described the REML method of estimating C and P. Crossa and Cerón-Rojas (2011) described matrices W and M in a doubled haploid population. In this study, we describe and estimate matrices W and M for an F2 population in the asymptotic context according to the Wright and Mowers (1994) approach, which is based on regressing phenotype values on marker coded values. We used this latter approach to estimate W and M, because it is a clearer estimation method than that of Lange and Whittaker (2001); however, the Wright and Mowers (1994) approach is an asymptotic method and should be regarded with precaution.

Matrix M is the covariance matrix of the molecular marker code values. All marker information used to construct matrix M is presented in Table 4.2. Based on this information, we found that the expectations (E(X1) and E(X2)) and the variances (V(X1) and V(X2)) of the marker coded values X<sup>1</sup> and X<sup>2</sup> are E(X1) ¼ E(X2) ¼ 0 and V(X1) ¼ V(X2) ¼ 1, whereas the covariance (Cov(X1, X2)) and correlation (Corr(X1, X2)), between X<sup>1</sup> and X<sup>2</sup> were

$$\operatorname{Cov}(X\_1, X\_2) = \operatorname{Corr}(X\_1, X\_2) = 1 - 2\delta.\tag{4.35}$$

Thus, as the variances of X<sup>1</sup> and X<sup>2</sup> are equal to 1, the correlation between X<sup>1</sup> and X<sup>2</sup> is Corr Xð Þ¼ <sup>1</sup>; X<sup>2</sup> Cov Xð Þ <sup>1</sup>;X<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi V Xð Þ<sup>1</sup> V Xð Þ<sup>2</sup> p ¼ 1 - 2δ, i.e., the covariance and correlation between X<sup>1</sup> and X<sup>2</sup> are the same. Equation (4.35) results indicate that if we perform the same operation with many markers, we will obtain similar results; they also indicate that this is the way to construct matrix M.


δ)

/4 -

1 -1

Let X be a matrix of coded markers of size n m, where n m and m¼ number of markers; then according to Wright and Mowers (1994), because all marker information is contained in matrix X<sup>0</sup> X, when the number of observations (n) tends to infinity, the product x<sup>0</sup> i x <sup>j</sup>=n tends to the covariance between markers ith and jth, whence matrix n-1 X0 X should tend to the covariance matrix between the markers that conform matrix X with the ijth element equal to (0.5 δij). Thus, matrix 2n-1 X0 X should tend to a covariance matrix where the ijth entry is equal to (1 - 2δij). Based on the latter result, an estimator of matrix M in the asymptotic context is

A2B2/A2B2 (1-

$$
\hat{\mathbf{M}} = 2n^{-1}\mathbf{X}'\mathbf{X}.\tag{4.36}
$$

Equation (4.36) is an asymptotic result and should be taken with caution. To date, there has been no clear method for estimating M in the non-asymptotic context; for this reason, Eq. (4.36) is used to estimate the GW-LMSI parameters.

Assume that a QTL is between the two markers in Table 4.2; then, δ can be written as δ ¼ r<sup>1</sup> + r<sup>2</sup> - 2r1r2, where r<sup>1</sup> and r<sup>2</sup> denote the recombination frequency between marker 1 and marker 2 respectively, with the QTL between them. When the number of genotypes or individuals tends to infinity, the covariance between the phenotypic trait values ( y) and the marker 1 coded values (X1) in an F2 population can be written as

$$Cov(X\_1, \mathbf{y}) = \frac{1}{2} a\_1 (1 - 2r\_1),\tag{4.37}$$

where α1(1 - 2r1) is the portion of the additive effect (α1) of the QTL linked to marker 1 (Edwards et al. 1987), and r<sup>1</sup> is the recombination frequency between the QTL and marker 1. We can assume that for many markers, the covariance of the phenotypic values is similar to Eq. (4.37), whence matrix W can be obtained.

Let y be a vector n 1 of recorded phenotypic values, where n denotes the number of observation or records, and X is a matrix of coded markers of size n m. When n tends to infinity, 2n-1 X0 y tends to be a vector with elements equal to αi(1 - 2ri), where α<sup>i</sup> is the additive effect of the ith QTL linked to the ith marker, and ri is the recombination frequency between the ith QTL and the ith marker. Now

let Y ¼ y<sup>11</sup> y<sup>12</sup> y1<sup>t</sup> y<sup>21</sup> y<sup>22</sup> y2<sup>t</sup> ⋮ ⋮ ⋮ yn<sup>1</sup> yn<sup>2</sup> ynt 2 6 6 4 3 7 7 5 be a matrix of observations for t traits; then, an

estimator of matrix W in the asymptotic context is

$$
\hat{\mathbf{W}} = 2n^{-1}\mathbf{X}'\mathbf{Y}.\tag{4.38}
$$

Once again, Eq. (4.38) is an asymptotic result and should be accepted with caution. But to date, there has been no clear method for estimating W in the non-asymptotic context; for this reason, Eq. (4.38) is used to estimate the GW-LMSI parameters.

#### 4.5 Comparing LMSI Versus LPSI and GW-LMSI Efficiency

To compare LMSI efficiency versus GW-LMSI efficiency for predicting the net genetic merit, we use the simulated data set described in Chap. 2, Sect. 2.8.1.

Figure 4.4 presents the estimated accuracy values of the LPSI (bρHI ^ <sup>¼</sup> <sup>σ</sup>bb<sup>I</sup> σbH ), the LMSI (bρHI ^<sup>M</sup> ¼ σbbI M σbH ), and the GW-LMSI (bρHI ^<sup>W</sup> ¼ σbbI W σbH ) for five simulated selection cycles. In addition, Table 4.3 presents the estimated LPSI, LMSI, and GW-LMSI selection responses, the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error ( 1 <sup>b</sup>ρ<sup>2</sup> HI ^ σb2 H, 1 <sup>b</sup>ρ<sup>2</sup> HI ^M σb2 <sup>H</sup> and 1 <sup>b</sup>ρ<sup>2</sup> HI ^W σb2 <sup>H</sup> respectively), the ratios of the estimated LMSI accuracy to the estimated LPSI accuracy and the estimated LMSI accuracy to the estimated GW-LMSI accuracy, expressed as percentages (Eq. 4.19), for five simulated selection cycles.

According to Fig. 4.4, for this data set the estimated LMSI accuracy (bρHI ^<sup>M</sup> ) was higher than the estimated LPSI and GW-LMSI accuracy (bρHI ^ andbρHI ^<sup>W</sup> respectively), for the five simulated selection cycles, that is, <sup>b</sup>ρHI ^<sup>M</sup> <sup>&</sup>gt; <sup>b</sup>ρHI ^ <sup>&</sup>gt; <sup>b</sup>ρHI ^<sup>W</sup> . In a similar manner, Table 4.3 results indicate that the estimated LMSI selection response (RbM) was higher than the estimated LPSI and GW-LMSI selection responses (Rb<sup>I</sup> and Rb<sup>W</sup> respectively): Rb<sup>M</sup> > Rb<sup>I</sup> > Rb<sup>W</sup> .

Note that the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error, and the estimated LMSI efficiency versus LPSI efficiency and versus GW-LMSI efficiency (expressed in percentages) are related to the estimated

Fig. 4.4 Estimated correlation values of the linear phenotypic selection index (LPSI), the linear molecular selection index (LMSI), and the genome-wide LMSI (GW-LMSI) with the net genetic merit for four traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles

Table 4.3 Estimated linear phenotypic, molecular, and genome-wide selection indices (LPSI, LMSI, and GW-LMSI respectively), selection responses and variance of the predicted error, and estimated ratio of LMSI accuracy to LPSI and GW-LMSI accuracy expressed in percentages for 4 traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles


LMSI, LPSI, and GW-LMSI accuracies, and that in all five selection cycles, bρHI ^<sup>M</sup> <sup>&</sup>gt; <sup>b</sup>ρHI ^ <sup>&</sup>gt; <sup>b</sup>ρHI ^<sup>W</sup> . This implies that the estimated LMSI variance of the predicted error was lower than the estimated LPSI and GW-LMSI variance of the predicted error. In a similar manner, because <sup>b</sup>ρHI ^<sup>M</sup> <sup>&</sup>gt; <sup>b</sup>ρHI ^ <sup>&</sup>gt; <sup>b</sup>ρHI ^<sup>W</sup> , the estimated LMSI efficiency was higher than the estimated LPSI efficiency and the estimated GW-LMSI efficiency.

Based on Fig. 4.4 and Table 4.3 results, we conclude that the LMSI was a better predictor of the net genetic merit than the LPSI, and that the LPSI is a better predictor of the net genetic merit than the GW-LMSI for this simulated data set.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 5 Linear Genomic Selection Indices

Abstract The linear genomic selection index (LGSI) is a linear combination of genomic estimated breeding values (GEBVs) used to predict the individual net genetic merit and select individual candidates from a nonphenotyped testing population as parents of the next selection cycle. In the LGSI, phenotypic and marker data from the training population are fitted into a statistical model to estimate all individual available genome marker effects; these estimates can then be used in subsequent selection cycles to obtain GEBVs that are predictors of breeding values in a testing population for which there is only marker information. The GEBVs are obtained by multiplying the estimated marker effects in the training population by the coded marker values obtained in the testing population in each selection cycle. Applying the LGSI in plant or animal breeding requires the candidates to be genotyped for selection to obtain the GEBV, and predicting and ranking the net genetic merit of the candidates for selection using the LGSI. We describe the LGSI and show that it is a direct application of the linear phenotypic selection index theory in the genomic selection context; next, we present the combined LGSI (CLGSI), which uses phenotypic and GEBV information jointly to predict the net genetic merit. The CLGSI can be used only in training populations when there are phenotypic and maker information, whereas the LGSI is used in testing populations where there is only marker information. We validate the theoretical results of the LGSI and CLGSI using real and simulated data.

#### 5.1 The Linear Genomic Selection Index

#### 5.1.1 Basic Conditions for Constructing the LGSI

Conditions described in Chap. 4 (Sect. 4.1.1) for constructing a valid linear molecular selection index (LMSI), are also necessary for the linear genomic selection index (LGSI); however, in addition to those conditions, the LGSI also requires:


#### 5.1.2 Genomic Breeding Values and Marker Effects

The breeding value (gi) is the average additive effects of the genes an individual receives from both parents; thus, it is a function of the genes transmitted from parents to progeny and is the only component that can be selected and, therefore, the main component of interest in breeding programs (Mrode 2005). The ith phenotypic value (yi) can be denoted as yi ¼ gi + ei, where gi is the breeding value and ei the residual. Basic assumptions for gi and ei are: both gi and ei have normal distribution with expectation equal to zero and variance σ<sup>2</sup> gi and <sup>σ</sup><sup>2</sup> ei respectively. This means that yi ¼ μ<sup>i</sup> + gi + ei is a linear mixed model (Mrode 2005; Searle et al. 2006), where μ<sup>i</sup> is the mean of yi.

Let y<sup>0</sup> <sup>i</sup> ¼ yi<sup>1</sup> yi<sup>2</sup> -- yin ½ be a vector 1 n of observations in the ith trait and let g0 <sup>i</sup> ¼ gi<sup>1</sup> gi<sup>2</sup> -- gin ½ be a vector 1 n of unobservable breeding values associated with yi; then y<sup>i</sup> can be written as

$$\mathbf{y}\_i = \mathbf{1}\mu\_i + \mathbf{Z}\mathbf{g}\_i + \mathbf{e}\_i,\tag{5.1}$$

where μ<sup>i</sup> is the mean of the ith trait, 1 is a vector n 1 of 1s, Z is a design matrix of 0s and 1s, g<sup>i</sup> ~ MVN (0, Aσ<sup>2</sup> gi ) is a vector of breeding values, and e<sup>i</sup> ~ MVN (0,Inσ<sup>2</sup> ei ) is a vector of residuals; 0 is the mean and Aσ<sup>2</sup> gi and <sup>I</sup>nσ<sup>2</sup> ei the covariance matrix of g<sup>i</sup> and e<sup>i</sup> respectively; A is the numerical relationship matrix (Mrode 2005) and I<sup>n</sup> an identity matrix <sup>n</sup> <sup>n</sup>; <sup>σ</sup><sup>2</sup> gi and <sup>σ</sup><sup>2</sup> ei are the additive and residual variances associated with gi and ei; and MVN stands for multivariate normal distribution.

Suppose that A, Z, μi, σ<sup>2</sup> gi , and σ<sup>2</sup> ei are known; then, according to Mrode (2005), the best linear unbiased predictor (BLUP) of g<sup>i</sup> can be written as

$$
\hat{\mathbf{g}}\_i = \sigma\_{\mathbf{g}\_i}^2 \mathbf{A} \mathbf{Z}^\prime \mathbf{V}^{-1} (\mathbf{y}\_i - \mathbf{1}\mu\_i),
\tag{5.2}
$$

where V<sup>1</sup> is the inverse matrix of the variance of yi, i.e., Var <sup>y</sup><sup>i</sup> ð Þ¼ <sup>σ</sup><sup>2</sup> gi ZAZ<sup>0</sup> <sup>þ</sup> <sup>I</sup>nσ<sup>2</sup> ei ¼ V. In the context of animal breeding, Eq. (5.2) is considered a univariate linear phenotypic selection index (LPSI) (Mrode 2005) and is used to rank and select individuals as parents of the next generation in the context of one trait. Equation (5.2) can be extended to the multi-trait phenotypic selection index case, but to predict the net genetic merit (H ¼ w<sup>0</sup> g, see Chap. 2 for details) it would be necessary to construct linear combinations of the predicted values of g<sup>i</sup> associated with the traits of interest as was described in the Foreword of this book.

The vector of the individual genomic breeding values (γi) associated with the ith characteristic (i ¼ 1, 2,...,t; t ¼ number of traits) of the candidates for selection can be written as

$$\mathbf{y}\_i = \mathbf{X}\mathbf{u}\_i,\tag{5.3}$$

where X is an n m matrix (n ¼ number of observations and m ¼ number of markers in the population) of coded marker values (2 2p, 1 2p, and 2p for genotypes AA, Aa, and aa respectively) associated with the additive effects of the quantitative trait loci (QTL) and u<sup>i</sup> is an m 1 vector of the additive effects of the QTL associated with markers that affect the ith trait. It is assumed that γ<sup>i</sup> has MVN with mean 0 and variance Gσ<sup>2</sup> <sup>γ</sup> , i.e., γ<sup>i</sup> ~ MVN (0, Gσ<sup>2</sup> γi ), where σ<sup>2</sup> <sup>γ</sup><sup>i</sup> is the additive genomic variance of γ<sup>i</sup> and G ¼ XX<sup>0</sup> /c is the n n additive genomic relationship matrix between genotypes; <sup>c</sup> <sup>¼</sup> <sup>X</sup><sup>m</sup> j¼1 2p <sup>j</sup> 1 p <sup>j</sup> -in an F2 population,

and <sup>c</sup> <sup>¼</sup> <sup>X</sup><sup>m</sup> j¼1 4p <sup>j</sup> 1 p <sup>j</sup> -in a double haploid population; p is the frequency of allele

A and 1 p is the frequency of allele a in the jth marker ( j ¼ 1, 2, ..., m).

The additive genomic relationship matrix G ¼ XX<sup>0</sup> /c has special properties. For example, in the asymptotic context, the expectation of matrix G is equal to the numerical relationship matrix A, i.e., E(G) ¼ A (Habier et al. 2007; Van Raden 2008); this means that G is a particular realization of A and when the number of markers and genotypes increases in the training population, the value of G tends to concentrate around A. Thus, it can be assumed that at the limit, when the number of markers and genotypes is very high, G ¼ A (Cerón-Rojas and Sahagún-Castellanos 2016).

The vector of genomic breeding values (Eq. 5.3) has a similar function in genomic selection as g<sup>i</sup> in the phenotypic selection context. In addition, g<sup>i</sup> can be written as g<sup>i</sup> ¼ γ<sup>i</sup> + ηi, where η<sup>i</sup> ¼ g<sup>i</sup> γ<sup>i</sup> (Gianola et al. 2003). Also, note that

$$Cov(\mathbf{g}\_i, \mathbf{y}\_i) = \sigma\_{\mathbf{y}\_i}^2,\tag{5.4}$$

i.e., the covariance between γ<sup>i</sup> and g<sup>i</sup> is equal to the variance of γ<sup>i</sup> (Dekkers 2007).

Let y<sup>0</sup> <sup>i</sup> ¼ yi<sup>1</sup> yi<sup>2</sup> -- yin ½ be a vector 1 n of observation of the ith trait in the training population and let γ<sup>0</sup> <sup>i</sup> ¼ γ<sup>i</sup><sup>1</sup> γ<sup>i</sup><sup>2</sup> -- γin ½ be a vector 1 n of unobservable genomic breeding values associated with yi; then, y<sup>i</sup> can also be written as

$$\mathbf{y}\_i = \mathbf{1}\mu\_i + \mathbf{Z}\mathbf{y}\_i + \mathbf{e}\_i,\tag{5.5}$$

where μ<sup>i</sup> is the mean of the ith trait, 1 is a vector n 1 of 1s, Z is a design matrix, γ<sup>i</sup> ~ MVN (0, Gσ<sup>2</sup> <sup>γ</sup> ) and ε<sup>i</sup> ~ MVN (0, Inσ<sup>2</sup> εi ) are vectors of genomic breeding values and of residuals respectively, and σ<sup>2</sup> <sup>ε</sup><sup>i</sup> is the residual variance. <sup>I</sup>n, <sup>G</sup>, and <sup>σ</sup><sup>2</sup> <sup>γ</sup> were defined in Eqs. (5.2) and (5.3).

According to Eqs. (5.2) and (5.3), when μi, σ<sup>2</sup> <sup>γ</sup> and σ<sup>2</sup> <sup>ε</sup><sup>i</sup> are known, the vector of GEBVs for the individuals with the ith trait can be obtained as

#### 102 5 Linear Genomic Selection Indices

$$
\widehat{\mathbf{y}}\_i = \sigma\_{\mathbf{y}\_i}^2 \mathbf{GZ}^\prime \mathbf{V}^{-1} (\mathbf{y}\_i - \mathbf{1}\mu\_i),
\tag{5.6}
$$

where the variance of <sup>y</sup><sup>i</sup> should now be written as <sup>V</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> γi ZGZ<sup>0</sup> <sup>þ</sup> <sup>I</sup>nσ<sup>2</sup> εi . In the context of genomic selection, Eq. (5.6) is considered a univariate LGSI and is used to rank and select individuals as parents of the next generation (Van Raden 2008; Togashi et al. 2011). Equation (5.6) is the BLUP of γ<sup>i</sup> and can be extended to a multitrait genomic selection index, but to predict the net genetic merit (H ¼ w<sup>0</sup> g), it is necessary to construct an LGSI, which is a linear combination of γi.

Although Eq. (5.6) is theoretically very important in LGSI, in practice we need to estimate the marker effects associated with all the traits of interest and to use these estimates in the testing population to obtain the GEBV of the candidates for selection. Let u<sup>0</sup> ¼ u<sup>0</sup> <sup>1</sup> u<sup>0</sup> <sup>2</sup> -- u<sup>0</sup> <sup>t</sup> ½ be a vector 1 nt associated with t traits. In the univariate context, Van Raden (2008) showed that the ith vector u<sup>i</sup> of marker effects in the training population can be estimated as

$$
\widehat{\mathbf{u}}\_i = c^{-1} \mathbf{X}' [\mathbf{G} + \nu \mathbf{I}\_n]^{-1} (\mathbf{y}\_i - \mathbf{1}\mu\_i),
\tag{5.7}
$$

where <sup>υ</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> ei σ2 gi ; σ<sup>2</sup> gi , σ<sup>2</sup> ei and the other parameters were defined earlier. According to

Ceron-Rojas et al. (2015), to estimate the vector u<sup>0</sup> ¼ u<sup>0</sup> <sup>1</sup> u<sup>0</sup> <sup>2</sup> -- u<sup>0</sup> <sup>t</sup> ½ in the multi-trait context, Eq. (5.7) can be written as

$$
\widehat{\mathbf{u}} = c^{-1} \mathbf{W}\_t' [(\mathbf{I}\_t \otimes \mathbf{G}) + (\mathbf{N} \otimes \mathbf{I}\_n)]^{-1} (\mathbf{y} - \boldsymbol{\mu} \otimes \mathbf{1}),
\tag{5.8}
$$

where W<sup>t</sup> ¼ I<sup>t</sup> X, "" denotes the Kronecker product (Schott 2005), c and X were defined in Eq. (5.3); <sup>N</sup> <sup>¼</sup> RC<sup>1</sup> , whereR and C are the residual and breeding value covariance matrices for t traits respectively; y<sup>0</sup> ¼ y<sup>0</sup> <sup>1</sup> y<sup>0</sup> <sup>2</sup> -- y<sup>0</sup> <sup>t</sup> ½ ~ MVN(μ, V) is a vector of size 1 tn, with covariance matrix V ¼ C G + R In; I<sup>t</sup> is an identity matrix of size <sup>t</sup> <sup>t</sup> and <sup>I</sup><sup>n</sup> was defined earlier; <sup>μ</sup><sup>0</sup> <sup>¼</sup> <sup>μ</sup><sup>1</sup> <sup>μ</sup><sup>2</sup> -- μ<sup>t</sup> ½ is a vector 1 t of means associated with vector y, and 1 is a vector n 1 of 1s. In this case, the estimator of the vector of sub-vectors of genomic breeding values γ<sup>0</sup> ¼ γ<sup>1</sup> γ<sup>2</sup> ... γ<sup>t</sup> ½ in the testing population can be obtained as

$$
\widehat{\mathbf{y}} = \mathbf{W}\_t \widehat{\mathbf{u}}.\tag{5.9}
$$

Equation (5.9) is the vector of GEBVs for the multi-trait case. Thus, in the testing population, in Eq. (5.9), only the coded values in matrix <sup>X</sup> change, whereas <sup>u</sup><sup>b</sup> is the same in each selection cycle. Note that to obtain Eqs. (5.7) and (5.8), we assumed that μ, C, and R are known.

We indicated that the genomic breeding values have normal distribution (Eq. 5.5). Using the simulated data described in Chap. 2, Sect. 2.8.1, in Fig. 5.1 we present the distribution of the GEBVs (Eq. 5.9) associated with traits T1 in the first (Fig. 5.1a) and the fifth (Fig. 5.1b) selection cycles in the testing population. In effect, the frequency distribution of the GEBVs approaches normal distribution in both selection cycles.

Fig. 5.1 Distribution of the genomic estimated breeding values (GEBVs) associated with traits T1 in (a) the first and (b) the fifth selection cycles in the testing population

#### 5.1.3 The LGSI and Its Parameters

Similar to the LPSI (Chap. 2), the objective of the LGSI is to predict the net genetic merit H ¼ w<sup>0</sup> g, where g<sup>0</sup> ¼ g<sup>1</sup> g<sup>2</sup> ... gt ½ (t ¼ number of traits) is a vector of unobservable true breeding values and w<sup>0</sup> ¼ w<sup>1</sup> w<sup>2</sup> ... wt ½ is a vector of economic weights. Suppose that the genomic breeding values γ<sup>i</sup> ¼ Xu<sup>i</sup> are known; then, the LGSI can be written as

$$I\_{\mathbb{G}} = \mathfrak{P}'\mathfrak{I},\tag{5.10}$$

where β is an unknown vector of weights.

The main advantage of the LGSI over the LPSI lies in the possibility of reducing the intervals between selection cycles (LG) by more than two thirds (Lorenz et al. 2011); thus, this parameter should be incorporated into the LGSI selection response and the expected genetic gain per trait to reflect the main advantage of the LGSI over the LPSI and the other indices. Assuming that LG ¼ 1, in the LPSI context we wrote the selection response as RI ¼ kIσHρHI; however, if LG 6¼ 1, the LGSI selection response can be written as

$$R\_{I\_G} = \frac{k\_I}{L\_G} \frac{\sigma\_{Hl\_G}}{\sigma\_{I\_G}^2} = \frac{k\_I}{L\_G} \sigma\_H \rho\_{Hl\_G},\tag{5.11}$$

where kI is the standardized selection differential (or selection intensity) associated with the LGSI, σHIG is the covariance between H ¼ w<sup>0</sup> g and the LGSI, σ<sup>2</sup> IG is the variance of the LGSI, σ<sup>H</sup> is the standard deviation of H, ρHIG is the correlation between H and the LGSI, and LG denotes the intervals between selection cycles.

Let C and Γ be matrices of covariance of the breeding values (g) and of the genomic breeding values (γ) respectively; then, the correlation between H ¼ w<sup>0</sup> g and I<sup>G</sup> ¼ β<sup>0</sup> γ can be written as

$$\rho\_{Hl\_G} = \frac{\mathbf{w}' \Gamma \mathfrak{P}}{\sqrt{\mathbf{w}' \mathbf{C} \mathbf{w}} \sqrt{\mathfrak{P}' \Gamma \mathfrak{P}}},\tag{5.12}$$

where w<sup>0</sup> Γβ ¼ σHIG is the covariance between H ¼ w<sup>0</sup> g and I<sup>G</sup> ¼ β<sup>0</sup> <sup>γ</sup>, <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> is the standard deviation of the variance of H ¼ w<sup>0</sup> <sup>g</sup>, and <sup>σ</sup>IG <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffi β0 Γβ p is the standard deviation of the variance of I<sup>G</sup> ¼ β<sup>0</sup> γ.

#### 5.1.4 Maximizing LGSI Parameters

To maximize the genomic selection response (Eq. 5.11), suppose that kI, σ<sup>H</sup> and LG are fixed and take the derivative of the natural logarithm (ln) of the correlation between H and I<sup>G</sup> (Eq. 5.12) with respect to vector β, equate the result of the derivative to the null vector, and isolate β, i.e.,

$$\frac{\partial}{\partial \mathbf{\overline{\mathcal{B}}}} \ln \rho\_{H l\_{\mathfrak{z}}} = \frac{\partial}{\partial \mathbf{\overline{\mathcal{B}}}} \ln \left( \frac{\mathbf{w}^{\prime} \mathbf{\Gamma} \mathbf{\overline{\mathcal{B}}}}{\sqrt{\mathbf{w}^{\prime} \mathbf{C} \mathbf{w}} \sqrt{\mathbf{\beta}^{\prime} \mathbf{\Gamma} \mathbf{\overline{\mathcal{B}}}}} \right) = \mathbf{0}. \tag{5.13}$$

The result is β ¼ sw, where s ¼ β<sup>0</sup> Γβ/w<sup>0</sup> Γβ is a proportional constant that does not affect the maximum value of ρHIG , because this is invariant to the scale change; then, assuming that β ¼ w, the maximized LGSI selection response can be written as

$$R\_{I\_G} = \frac{k\_I}{L\_G} \sqrt{\mathbf{w}' \Gamma \mathbf{w}}.\tag{5.14}$$

Hereafter, we refer to the LGSI genomic selection response as that of Eq. (5.14). Also, because β ¼ w, Eq. (5.12) can be written as

$$
\rho\_{H\mathbf{l}\_{\odot}} = \frac{\sqrt{\mathbf{w}^{\prime}\mathbf{T}\mathbf{w}}}{\sqrt{\mathbf{w}^{\prime}\mathbf{C}\mathbf{w}}} = \frac{\sigma\_{I\_{\odot}}}{\sigma\_{H}},\tag{5.15}
$$

which is the maximized correlation between H ¼ w<sup>0</sup> g and I<sup>G</sup> ¼ β<sup>0</sup> γ, or LGSI accuracy; <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> is the standard deviation of the variance of <sup>H</sup>, and <sup>σ</sup>IG <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffi β0 Γβ <sup>p</sup> is the standard deviation of the variance of <sup>I</sup>G.

The LGSI expected genetic gain per trait (EIG ) can be written as

$$\mathbf{E}\_{I\_G} = \frac{k\_I}{L\_G} \frac{\Gamma \mathbf{w}}{\sqrt{\mathbf{w}' \Gamma \mathbf{w}}}.\tag{5.16}$$

All the terms in Eq. (5.16) were previously defined.

Let <sup>λ</sup><sup>G</sup> <sup>¼</sup> <sup>ρ</sup>HIG ρHI be LGSI efficiency versus LPSI efficiency to predict the net genetic merit, where ρHIG is the LGSI accuracy and ρHI the LPSI accuracy; in percentage terms, LGSI efficiency versus LPSI efficiency for each selection cycle can be written as

$$p\_G = 100(\lambda\_G - 1). \tag{5.17}$$

According to Eq. (5.17), if pG > 0, LGSI efficiency is greater than LPSI efficiency; if pG <sup>¼</sup> 0, the efficiency of both selection indices is equal, and if pG <sup>&</sup>lt; 0, the LPSI is more efficient than the LGSI at predicting <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> g.

Equation (5.17) is useful for measuring LGSI efficiency in terms of accuracy when predicting the net genetic merit (H ¼ w<sup>0</sup> g), whereas the Technow et al. (2013) inequality measures LGSI efficiency in terms of the time needed to complete one selection cycle. In the context of the LGSI and the LPSI, the Technow inequality can be written as

$$L\_G < \frac{\rho\_{HI\_G}}{h\_I} L\_P,\tag{5.18}$$

where LG and LP denote the time required to complete one selection cycle for the LGSI and the LPSI respectively, ρHIG is the LGSI accuracy, and hI is the square root of the heritability (Lin and Allaire 1977; Nordskog 1978) of the LPSI, which can be denoted as hI ¼ ffiffiffiffiffiffiffi b0 Cb b0 Pb <sup>q</sup> (see Chap. 2 for details). Then, assuming that the selection intensity is the same for both selection indices, if Eq. (5.18) is true, the LGSI is more efficient than the LPSI per unit of time.

#### 5.1.5 Relationship Between the LGSI and LPSI Selection Responses

To obtain the relationship between RIG and RI in the asymptotic context, we omitted the intervals between selection cycles (LG and LI respectively) to simplify the algebra. Consider a population where the number of genotypes and markers tends to infinity; in this case, markers explain most of the true additive genetic variances and covariances. Thus, we can assume that matrices Γ and C are very similar, and at the limit, Γ ¼ C. Now suppose that in this population the phenotypic variance–covariance matrix (P) is known and comprises matrix Γ and the variance–covariance residual matrix (R). In this case, the inverse of <sup>P</sup> can be written as <sup>P</sup><sup>1</sup> <sup>¼</sup> (<sup>Γ</sup> <sup>+</sup> <sup>R</sup>) <sup>1</sup> <sup>¼</sup> <sup>Γ</sup><sup>1</sup> <sup>Γ</sup><sup>1</sup> (Γ<sup>1</sup> + R<sup>1</sup> ) 1 Γ<sup>1</sup> , where Γ<sup>1</sup> and R<sup>1</sup> are the inverses of matrices Γ and R respectively. Thus, the LPSI selection response is given by

$$R\_I = k\_I \sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}} = k\_I \sqrt{\mathbf{w}' \mathbf{T} \mathbf{P}^{-1} \Gamma \mathbf{w}} = k\_I \sqrt{\mathbf{w}' \Gamma \mathbf{w} - \mathbf{w}' \left(\Gamma^{-1} + \mathbf{R}^{-1}\right)^{-1} \mathbf{w}}, \quad (5.19)$$

where <sup>b</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Γw is the vector of coefficients of the LPSI in the asymptotic context. Note that b<sup>0</sup> Pb 0 and w<sup>0</sup> Γw 0, i.e., b<sup>0</sup> Pb and w<sup>0</sup> Γw are positive semi-definite, meaning that w<sup>0</sup> Γw w<sup>0</sup> (Γ<sup>1</sup> + R<sup>1</sup> ) 1 w 0; then, in the asymptotic context, RIG RI. This result is not common when the number of genotypes and markers is small; however, it gives an idea of the theoretical behavior of RIG with respect to RI when the number of markers and genotypes is very large.

Because g<sup>q</sup> can be written as g<sup>q</sup> ¼ γ<sup>q</sup> + ηq, where η<sup>q</sup> ¼ g<sup>q</sup> γ<sup>q</sup> (q ¼ 1, 2, ---, t), for low numbers of markers and genotypes, the covariance genotypic matrix C can be written as C ¼ Γ + E, where E ¼ C Γ; then, the inverse of matrix P can be written as <sup>P</sup><sup>1</sup> <sup>¼</sup> [(<sup>Γ</sup> <sup>+</sup> <sup>E</sup>) + <sup>R</sup>] <sup>1</sup> <sup>¼</sup> (<sup>Γ</sup> <sup>+</sup> <sup>E</sup>) <sup>1</sup> (<sup>Γ</sup> <sup>+</sup> <sup>E</sup>) 1 [(Γ + E) <sup>1</sup> + R<sup>1</sup> ] 1 (Γ + E) 1 . In the latter case, the LPSI selection response RI can be written as

$$\begin{split} R\_I &= k\_I \sqrt{\mathbf{w}'(\Gamma + \mathbf{E})\mathbf{P}^{-1}(\Gamma + \mathbf{E})\mathbf{w}} \\ &= k\_I \sqrt{\mathbf{w}'\Gamma\mathbf{w} + \mathbf{w}'\mathbf{E}\mathbf{w} - \mathbf{w}'\left[\left(\Gamma + \mathbf{E}\right)^{-1} + \mathbf{R}^{-1}\right]^{-1}\mathbf{w}}. \end{split} \tag{5.20}$$

Equation (5.20) indicates that in the non-asymptotic context (low numbers of markers and genotypes), RIG and RI are related in three possible ways:

1. RI > RIG if w<sup>0</sup> Ew > w<sup>0</sup> [(Γ + E) <sup>1</sup> + R<sup>1</sup> ] 1 w 2. RI ¼ RIG if w<sup>0</sup> Ew ¼ w<sup>0</sup> [(Γ + E) <sup>1</sup> + R<sup>1</sup> ] 1 w 3. RIG > RI if w<sup>0</sup> Ew < w<sup>0</sup> [(Γ + E) <sup>1</sup> + R<sup>1</sup> ] 1 w

The second and third points indicate that RIG may be equal to or larger than RI, even under a small number of markers, depending on the size of w<sup>0</sup> Ew and w<sup>0</sup> [(Γ + E) <sup>1</sup> + R<sup>1</sup> ] 1 w. These three points explain the theoretical relationship between RI and RIG for a low number of markers and genotypes. When Γ ¼ C, E ¼ 0, and RI ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Γ</sup><sup>w</sup> <sup>w</sup><sup>0</sup> <sup>Γ</sup><sup>1</sup> <sup>þ</sup> <sup>R</sup><sup>1</sup> - <sup>1</sup> w q , then RIG RI.

#### 5.1.6 Statistical LGSI Properties

Assuming that H and IG have joint bivariate normal distribution and that Γ, C, and w are known, the LGSI has the following properties:


The LGSI properties described in points 1–4 of this subsection are the same as the LPSI properties described in Chap. 2. This corroborates the LGSI as an application of the LPSI theory to the genomic selection context.

#### 5.1.7 Genomic Covariance Matrix in the Training and Testing Population

To derive the LGSI theory, we assumed that the true genomic additive variance– covariance matrix Γ was known. However, in practice, we need to estimate it. In the training population, matrix Γ can be estimated by restricted maximum likelihood (REML) using phenotypic and genomic information, as described by Vattikuti et al. (2012) and Su et al. (2012). In Eqs. (2.22) to (2.24) of Chap. 2, we presented the formulas for estimating the genotypic and residual variance and covariance based on the formulas described by Lynch and Walsh (1998). Here, we present a brief description of how we can estimate the qth component (σγqq) of Γ in the training population using the REML method.

We estimated <sup>σ</sup>γqq <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>γ</sup><sup>q</sup> (q, q<sup>0</sup> ¼ t ¼ number of traits) in the absence of dominance and epistatic effects, using the model y<sup>q</sup> ¼ 1μ<sup>q</sup> + Zγ<sup>q</sup> + εq, where the vector y<sup>q</sup> ~ NMV(1μq, Vq) g 1 (g ¼ number of genotypes in the population) had a multivariate normal distribution; 1 was a g 1 vector of 1s, μ<sup>q</sup> was the mean of the <sup>q</sup>th trait, <sup>Z</sup> was an identity matrix <sup>g</sup> <sup>g</sup>; <sup>γ</sup><sup>q</sup> ~ NMV(0,Gσ<sup>2</sup> <sup>γ</sup><sup>q</sup>) was a vector of genomic breeding values, and ε<sup>q</sup> ~ NMV(0, Iσ<sup>2</sup> <sup>ε</sup><sup>q</sup> ) was a g 1 vector of residuals. Matrix G ¼ XX<sup>0</sup> /<sup>c</sup> was the genomic relationship matrix, and in an F2 population,<sup>c</sup> <sup>¼</sup> <sup>X</sup> N j¼1 2p jq <sup>j</sup> ;

X was a g m matrix (m ¼ number of markers) of the coded marker values (2 2p for AA, 1 2p for Aa, and 2p for aa) for the additive effects of the markers; p and q denote the frequency of allele A and the frequency of allele a in the jth marker ( <sup>j</sup> <sup>¼</sup> 1, 2, ..., <sup>m</sup>), and <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>G</sup>σ<sup>2</sup> <sup>γ</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> εq .

The expectation–maximization algorithm allowed the REML for the variance components σ<sup>2</sup> <sup>γ</sup><sup>q</sup> and σ<sup>2</sup> <sup>ε</sup><sup>q</sup> to be computed by iterating the following equations:

$$
\sigma\_{\rm \gamma q}^{2(n+1)} = \sigma\_{\rm \gamma q}^{2(n)} + \frac{\left(\sigma\_{\rm \gamma q}^{2(n)}\right)^2}{g} \left[ \mathbf{y}\_q' \left( \mathbf{T}^{(n)} \mathbf{G} \mathbf{T}^{(n)} \right) \mathbf{y}\_q - tr \left( \mathbf{T}^{(n)} \mathbf{G} \right) \right] \tag{5.21}
$$

and

$$
\sigma\_{\varepsilon\_q}^{2(n+1)} = \sigma\_{\varepsilon\_q}^{2(n)} + \frac{\left(\sigma\_{\varepsilon\_q}^{2(n)}\right)^2}{g} \left[ \mathbf{y}\_q' \left( \mathbf{T}^{(n)} \mathbf{T}^{(n)} \right) \mathbf{y}\_q - tr \left( \mathbf{T}^{(n)} \right) \right], \tag{5.22}
$$

where g is the number of genotypes. After n iterations, when σ2ð Þ <sup>n</sup>þ<sup>1</sup> <sup>γ</sup><sup>q</sup> was very similar to σ2ð Þ<sup>n</sup> <sup>γ</sup><sup>q</sup> and <sup>σ</sup>2ð Þ <sup>n</sup>þ<sup>1</sup> <sup>ε</sup><sup>q</sup> was very similar to <sup>σ</sup>2ð Þ<sup>n</sup> <sup>ε</sup><sup>q</sup> , <sup>σ</sup>2ð Þ <sup>n</sup>þ<sup>1</sup> <sup>γ</sup><sup>q</sup> and <sup>σ</sup>2ð Þ <sup>n</sup>þ<sup>1</sup> <sup>ε</sup><sup>q</sup> were the estimated variance components of σ<sup>2</sup> <sup>γ</sup><sup>q</sup> and σ<sup>2</sup> <sup>ε</sup><sup>q</sup> respectively. In Eqs. (5.21) and (5.22) tr(.) denoted the trace of the matrices within brackets; <sup>T</sup> <sup>¼</sup> <sup>V</sup><sup>1</sup> <sup>q</sup> <sup>V</sup><sup>1</sup> <sup>q</sup> 1 1<sup>0</sup> V<sup>1</sup> <sup>q</sup> 1 <sup>1</sup> 10 V<sup>1</sup> <sup>q</sup> , and V<sup>1</sup> <sup>q</sup> was the inverse of <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>G</sup>σ<sup>2</sup> <sup>γ</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> εq . In matrix T(n) , V1ð Þ<sup>n</sup> <sup>q</sup> was the inverse of matrix Vð Þ<sup>n</sup> <sup>q</sup> <sup>¼</sup> <sup>G</sup>σ2ð Þ<sup>n</sup> <sup>γ</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ2ð Þ<sup>n</sup> <sup>ε</sup><sup>q</sup> .

The genomic additive genetic covariance between the observations of the qth and ith traits, y<sup>q</sup> and y<sup>i</sup> (σγqi , q, i ¼ 1, 2,...,t), can be estimated by REML. Here, we adapted Eqs. (5.21) and (5.22) using the variance of the sum of y<sup>q</sup> and yi, i.e., Var (y<sup>i</sup> <sup>+</sup> <sup>y</sup>q) <sup>¼</sup> <sup>V</sup><sup>i</sup> <sup>+</sup> <sup>V</sup><sup>q</sup> + 2Ciq, where <sup>V</sup><sup>i</sup> <sup>¼</sup> <sup>G</sup>σ<sup>2</sup> <sup>γ</sup><sup>i</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> <sup>ε</sup><sup>i</sup> ¼ Var y<sup>i</sup> ð Þ is the variance of y<sup>i</sup> and <sup>V</sup><sup>q</sup> <sup>¼</sup> <sup>G</sup>σ<sup>2</sup> <sup>γ</sup><sup>q</sup> <sup>þ</sup> <sup>I</sup>σ<sup>2</sup> <sup>ε</sup><sup>q</sup> ¼ Var y<sup>q</sup> - is the variance of <sup>y</sup>q; 2Ciq <sup>¼</sup> <sup>2</sup>Gσ<sup>γ</sup>iq <sup>þ</sup> <sup>2</sup>Iσεiq ¼ 2Cov yi; y<sup>q</sup> - is the covariance of <sup>y</sup><sup>q</sup> and <sup>y</sup>i, and <sup>σ</sup><sup>γ</sup>iq and σεiq are the genomic and residual covariance respectively, associated with y<sup>i</sup> and yq. Thus, one way of estimating σ<sup>γ</sup>iq and σeiq is by using the following equation:

$$0.5Var(\mathbf{y}\_i + \mathbf{y}\_q) - 0.5Var(\mathbf{y}\_i) - 0.5Var(\mathbf{y}\_q),\tag{5.23}$$

for which Eqs. (5.21) and (5.22) can be adapted.

If there is only marker information on the testing population, then it is not possible to estimate Γ using Eqs. (5.21) to (5.23). Another way of estimating Γ is to use the method proposed by Ceron-Rojas et al. (2015), which requires the estimated values of <sup>γ</sup><sup>q</sup> (bγ<sup>q</sup> ) in the cycle of interest. Let <sup>u</sup><sup>b</sup> be the estimator of the vector of marker effects u<sup>0</sup> ¼ u<sup>0</sup> <sup>1</sup> u<sup>0</sup> <sup>2</sup> -- u0 <sup>t</sup> ½ for t traits obtained in the training population. We obtained the qth GEBVs (q ¼ 1, 2, ..., t) in the lth selection cycle (l ¼ 1, 2, ..., number of cycles) as

$$
\hat{\mathbf{y}}\_{ql} = \mathbf{X}\_l \hat{\mathbf{u}}\_q \tag{5.24}
$$

where <sup>u</sup>b<sup>q</sup> is the vector of size <sup>m</sup> 1 of the estimated marker effects of the <sup>q</sup>th trait in the training population and X<sup>l</sup> is a matrix of size n m of the coded values of marker genotypes in the lth selection cycle of the testing population.

Now suppose that γ<sup>q</sup> and γ<sup>q</sup><sup>0</sup> have multivariate normal distribution jointly, with mean 1μγ<sup>q</sup> and 1μγ<sup>q</sup><sup>0</sup> respectively, and covariance matrix Gσγqq<sup>0</sup> , where 1 is an n 1 vector of 1s and G ¼ XX<sup>0</sup> /c is the additive genomic relationship matrix. Then, Γ ¼ σγqq<sup>0</sup> n o can be estimated as

$$
\widehat{\Gamma}\_l = \left\{ \widehat{\sigma}\_{\mathbb{Y}\_{qq'}} \right\},
\tag{5.25}
$$

where <sup>σ</sup>b<sup>γ</sup>qq<sup>0</sup> <sup>¼</sup> <sup>1</sup> g - <sup>b</sup>γql <sup>1</sup>μb<sup>γ</sup>ql 0 G<sup>1</sup> l - γ bq0 <sup>l</sup> <sup>1</sup>μb<sup>γ</sup>q0<sup>l</sup> is the estimated covariance between γ<sup>q</sup> and γ<sup>q</sup><sup>0</sup> in the lth selection cycle of the testing population; g is the number of genotypes; γ <sup>b</sup>ql was defined in Eq. (5.24); <sup>μ</sup>b<sup>γ</sup>ql and <sup>μ</sup>b<sup>γ</sup>q0<sup>l</sup> are the estimated arithmetic means of the values of γ <sup>b</sup>ql and <sup>γ</sup> bq0 l ; <sup>1</sup> is a <sup>g</sup> 1 vector of 1s and <sup>G</sup><sup>l</sup> <sup>¼</sup> <sup>c</sup><sup>1</sup> XlX<sup>0</sup> <sup>l</sup> is the additive genomic relationship matrix in the lth selection cycle (l ¼ 1, 2, ..., number of cycles) in the testing population.

From Eq. (5.25) we can estimate the LGSI response and expected genetic gain per trait in the testing population as

$$
\widehat{R}\_{I\_G} = \frac{k\_I}{L\_G} \sqrt{\mathbf{w}' \widehat{\mathbf{\Gamma}} \mathbf{w}} \quad \text{and} \quad \widehat{\mathbf{E}}\_{I\_G} = \frac{k\_I}{L\_G} \frac{\widehat{\mathbf{\Gamma}} \mathbf{w}}{\sqrt{\mathbf{w}' \widehat{\mathbf{\Gamma}} \mathbf{w}}}, \tag{5.26}
$$

respectively. The estimated LGSI (bI <sup>G</sup> ) values in the lth selection cycle can be obtained as

$$
\widehat{I}\_{\mathcal{G}} = \sum\_{q=1}^{t} w\_q \widehat{\mathfrak{I}}\_{ql},
\tag{5.27}
$$

where wq is the qth economic weight and γ <sup>b</sup>ql was defined in Eq. (5.24). Equation (5.27) is a vector of size g 1 (g¼ number of genotypes). In practice, bI <sup>G</sup> values are ranked to select individual genotypes with optimal GEBVs.

#### 5.1.8 Numerical Examples

To estimate matrices C and R and the marker effects in the training population, we used a real maize (Zea mays) F2 population with 248 genotypes (each with two repetitions), 233 molecular markers, and three traits—grain yield (GY, ton ha<sup>1</sup> ), ear height (EHT, cm), and plant height (PHT, cm)—evaluated in one 2 3

environment. The estimated matrices were Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 4 5 and 2 3

Rb ¼ :38 0:72 1:27 :72 47:14 60:96 :27 60:96 121:46 4 , which were estimated by Eqs. (5.21) to (5.23)

using the numerical relationship A instead of the genomic relationship matrix (G ¼ XX<sup>0</sup> /c).

Table 5.1 presents the first 20 BLUPs of the estimated marker effects (Eq. 5.8) in the training population and the first 20 marker coded values and GEBVs (Eq. 5.9) obtained in the testing population associated with trait GY. In the

Table 5.1 The 20 best linear unbiased predictors (BLUPs) of the estimated marker effects in the training population and the first 20 marker coded values and genomic estimated breeding values (GEBVs) obtained in the testing population associated with grain yield


testing population, there were 380 genotypes and 233 molecular markers. In this population, the estimated genomic covariance matrix Γ ¼ σγqq<sup>0</sup> n o was

Γb ¼ 0:21 2:95 5:00 2:95 42:41 71:11 5:00 71:11 121:53 2 4 3 <sup>5</sup>. The first GEBV (0.195) related to GY in

Table 5.1 was obtained as 0.195 ¼ 0.0003(1) 0.0038(1) 0.0085 (0) + ---0.03(1). The other GEBVs can be obtained in a similar manner.

Suppose a selection intensity of 10% (kI ¼ 1.755) and a vector of economic weights of w<sup>0</sup> ¼ ½ 5 0:1 0:1 ; then, the estimated LGSI selection response and the expected genetic gain per trait without including the interval between selection cycle is RbIG ¼ ð Þ 1:755 ffiffiffiffiffiffiffiffiffiffiffi w0 Γbw p ¼ 0:92 and Eb<sup>0</sup> IG <sup>¼</sup> ð Þ <sup>1</sup>:<sup>755</sup> <sup>w</sup><sup>0</sup> Γb ffiffiffiffiffiffiffiffiffiffiffi w0 Γbw <sup>p</sup> <sup>¼</sup> ½ <sup>0</sup>:80 11:41 19:<sup>28</sup>

respectively, whereas the estimated LGSI accuracy was <sup>b</sup>ρHIG <sup>¼</sup> <sup>0</sup>:48.

Chapter 11 presents RIndSel, a graphical unit interface that uses selection index theory to select individual candidates as parents for the next selection cycle, which can be used to obtain the results of the real numerical example described in this subsection.

To compare LGSI efficiency versus LPSI efficiency we used the simulated data described in Chap. 2, Sect. 2.8.1. According to Beyene et al. (2015), at least 4 years are required to complete one phenotypic selection cycle in maize, whereas genomic selection requires only 1.5 years. Thus, to compare LGSI efficiency versus LPSI efficiency in terms of time, we can use the Technow et al. (2013) inequality described in Eq. (5.18).

Table 5.2 presents the estimated value of Eq. (5.18) for five simulated selection cycles. The LGSI efficiency was higher than LPSI efficiency in terms of time, because the Technow et al. (2013) inequality was true in the five selection cycles. An additional result obtained by Ceron-Rojas et al. (2015) is presented in Fig. 5.2, which shows the correlation among the LGSI, the LPSI, and the true net genetic


Time required for the linear genomic selection index (LG) and linear phenotypic selection index (LP) to complete one selection cycle; estimated accuracy (ρbHIG ) of the linear genomic selection index and the square root of the estimated heritability of the linear phenotypic selection index (h b<sup>I</sup> ); estimated right-hand side ( <sup>ρ</sup>bHIG h bI LP) of the inequality formula (LG <sup>&</sup>lt; <sup>ρ</sup>H,IG hI LP)

Fig. 5.2 Correlation between the linear genomic selection index (LGSI), the linear phenotypic selection index (LPSI), and true net genetic merit (H) values in seven selection cycles. For each selection cycle, the first column indicates the correlation between the LGSI estimated values and the H true values, whereas the second column shows the correlation between the LPSI estimated values and the H true values

merit values in seven selection cycles. According to Fig. 5.2, the correlation between the LGSI and the true net genetic merit values was higher than the correlation between the LPSI and the true net genetic merit values for the first three selection cycles; after this cycle, the correlation between LGSI and the true net genetic merit values tended to decrease.

#### 5.2 The Combined Linear Genomic Selection Index

The combined LGSI (CLGSI) developed by Dekkers (2007) is a slightly modified version of the LMSI (see Chap. 4 for details), which, instead of using the marker scores, uses the GEBVs and the phenotypic information jointly to predict the net genetic merit. The main difference between the CLGSI and the LGSI is that the CLGSI can only be used in training populations, whereas the LGSI is used in testing populations. The basic conditions for constructing a valid CLGSI include conditions for constructing the LPSI, the LMSI, and the LGSI, because the CLGSI uses GEBVs and phenotypic information jointly to predict the net genetic merit.

#### 5.2.1 The CLGSI Parameters

The net genetic merit can be written in a similar manner to that in the LMSI context, that is, as

$$H = \mathbf{w}'\mathbf{g} + \mathbf{w}'\_2\mathbf{\hat{y}} = \begin{bmatrix} \mathbf{w}' & \mathbf{w}'\_2 \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{\hat{y}} \end{bmatrix} = \mathbf{a}'\_G \mathbf{z}\_G,\tag{5.28}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gt ½ is the vector of breeding values, w<sup>0</sup> ¼ w<sup>1</sup> -- wt ½ is the vector of economic weights associated with breeding values, w0 <sup>2</sup> ¼ 01 --- 0<sup>t</sup> ½ is a null vector associated with the vector of genomic breeding values γ<sup>0</sup> ¼ γ<sup>1</sup> γ<sup>2</sup> ... γ<sup>t</sup> ½ , a<sup>0</sup> <sup>G</sup> ¼ w<sup>0</sup> w<sup>0</sup> <sup>2</sup> ½ and z<sup>G</sup> ¼ g<sup>0</sup> γ<sup>0</sup> ½ .

The CLGSI can be written as

$$I\_C = \mathfrak{F}\_\mathbf{y}^\prime \mathbf{y} + \mathfrak{F}\_G^\prime \mathbf{y} = \begin{bmatrix} \mathfrak{F}\_\mathbf{y}^\prime & \mathfrak{F}\_G^\prime \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{y} \end{bmatrix} = \mathfrak{F}\_C^\prime \mathfrak{k}\_C,\tag{5.29}$$

where y<sup>0</sup> ¼ y<sup>1</sup> -- yt ½ (t ¼ number of traits) is the vector of phenotypic values; γ was defined earlier; β<sup>0</sup> <sup>y</sup> and <sup>β</sup><sup>G</sup> are vectors of coefficients of phenotypic and genomic weight values respectively; β<sup>0</sup> <sup>C</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> G and t 0 <sup>G</sup> ¼ y<sup>0</sup> γ<sup>0</sup> ½ .

The CLGSI selection response can be written as

$$R\_C = k\_I \sigma\_H \rho\_{Hl\_C} = k\_I \sigma\_H \frac{\mathbf{a}\_C^\prime \mathbf{V}\_C \mathfrak{P}\_C}{\sqrt{\mathbf{a}\_C^\prime \mathbf{V}\_C \mathbf{a}\_C} \sqrt{\mathfrak{P}\_C^\prime \mathbf{T}\_C \mathfrak{P}\_C}},\tag{5.30}$$

where kI is the standardized selection differential of the CLGSI, σ<sup>2</sup> <sup>H</sup> ¼ a<sup>0</sup> <sup>C</sup>ΨCa<sup>C</sup> and Var Ið Þ¼ <sup>C</sup> β<sup>0</sup> <sup>C</sup>TCβ<sup>C</sup> are the variances of H and IC, whereas a<sup>0</sup> <sup>C</sup>ΨCβ<sup>C</sup> and ρHIC are the covariance and the correlation between <sup>H</sup> and IC respectively; <sup>T</sup><sup>C</sup> <sup>¼</sup> Var <sup>y</sup> γ ¼ P Γ Γ Γ and <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> Var <sup>g</sup> γ <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ are block matrices of the phenotypic

covariance matrix, P ¼ Var(y), the genomic covariance matrix, Γ ¼ Var(γ), and the genetic breeding values covariance matrix, C ¼ Var(g).

Suppose that matrices Ψ<sup>C</sup> and T<sup>C</sup> are known; then the CLGSI vector of coefficients that simultaneously maximizes ρHIC and RC can be written as

$$
\mathfrak{B}\_{\mathcal{C}} = \mathbf{T}\_{\mathcal{C}}^{-1} \Psi\_{\mathcal{C}} \mathbf{a}\_{\mathcal{C}}, \tag{5.31}
$$

whence the optimized CLGSI is

$$I\_{\mathcal{C}} = \mathfrak{f}\_{\mathcal{C}}^{\prime} \mathfrak{t}\_{\mathcal{C}},\tag{5.32}$$

Equations (5.31) and (5.32) indicate that the CLGSI is an application of the LPSI to the genomic selection context.

From Eq. (5.31), the maximized CLGSI selection response, expected genetic gain per trait and accuracy can be written as

$$R\_C = k\_I \sqrt{\mathfrak{P}\_C' \mathbf{T}\_C \mathfrak{P}\_C},\tag{5.33}$$

$$\mathbf{E}\_C = k\_I \frac{\Psi\_C \mathfrak{h}\_C}{\sqrt{\mathfrak{P}\_C' \mathbf{T}\_C \mathfrak{h}\_C}} \tag{5.34}$$

and

$$
\rho\_{H\!\_{\mathcal{C}}} = \frac{\sqrt{\mathfrak{P}\_{\mathcal{C}}^{\prime}\mathbf{T}\_{\mathcal{C}}\mathfrak{P}\_{\mathcal{C}}}}{\sqrt{\mathbf{w}^{\prime}\mathbf{C}\mathbf{w}}},\tag{5.35}
$$

respectively. Note that the maximized LPSI accuracy is <sup>ρ</sup>HI <sup>¼</sup> ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> ffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> (see Chap. 2). The denominator of the accuracy of the CLGSI and <sup>ρ</sup>HI <sup>¼</sup> ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> ffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> is the same; however, the numerator of the two indices accuracy is different. We would expect that ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>C</sup>TCβ<sup>C</sup> q ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> , and then <sup>ρ</sup>HIC <sup>ρ</sup>HI. Similar results can be observed when we compared the maximized LPSI selection response and expected genetic gain per trait with the maximized CLGSI selection response and expected genetic gain per trait.

#### 5.2.2 Relationship Between the CLGSI and the LGSI

As we have indicated, the CLGSI is mathematically equivalent to the LMSI; thus, it has similar statistical properties to those of the LMSI, some of which are described in this section. The rest can be seen in Chap. 4. Let <sup>Q</sup><sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> ΨC, then matrix Q<sup>C</sup> can be written as

$$\mathbf{Q}\_{C} = \begin{bmatrix} \left(\mathbf{P} - \Gamma\right)^{-1} (\mathbf{C} - \Gamma) & \mathbf{0} \\ \mathbf{I} - \left(\mathbf{P} - \Gamma\right)^{-1} (\mathbf{C} - \Gamma) & \mathbf{I} \end{bmatrix},\tag{5.36}$$

whence as w<sup>0</sup> <sup>2</sup> ¼ 01 --- 0<sup>t</sup> ½ , the two sub-vectors that conform vector β<sup>C</sup> ¼ QCa<sup>C</sup> or β<sup>0</sup> <sup>C</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> G can be written as

$$\mathfrak{B}\_{\mathbf{y}} = (\mathbf{P} - \Gamma)^{-1} (\mathbf{C} - \Gamma) \mathbf{w},\tag{5.37}$$

and

$$\mathfrak{f}\_G = \left[\mathbf{I} - (\mathbf{P} - \Gamma)^{-1}(\mathbf{C} - \Gamma)\right] \mathbf{w} = \mathbf{w} - \mathfrak{f}\_{\mathbf{y}}.\tag{5.38}$$

When Γ is equal to the null matrix (no genomic information), Eq. (5.37) is equal to <sup>β</sup><sup>y</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Cw ¼ b and RC ¼ kI ffiffiffiffiffiffiffiffiffiffi b0 Pb <sup>p</sup> <sup>¼</sup> RI, which are the LPSI vector of coefficients and the selection response.

By Eqs. (5.37) and (5.38), the maximized CLGSI selection response and the optimized CLGSI can be written as

$$R\_C = k\_I \sqrt{\mathbf{w'C(P-\Gamma)^{-1}(\mathbf{C}-\Gamma)\mathbf{w}+\mathbf{w'}\Gamma\left[\mathbf{I}-\left(\mathbf{P}-\Gamma\right)^{-1}(\mathbf{C}-\Gamma)\right]\mathbf{w}}\tag{5.39}$$

and

$$I\_C = \mathfrak{B}\_\mathbf{y}\mathfrak{y} + \mathfrak{B}\_G\mathfrak{y} = \mathbf{w}'\mathfrak{y} + \mathfrak{B}\_\mathbf{y}(\mathfrak{y} - \mathfrak{y}),\tag{5.40}$$

respectively.

Assume that when the number of markers and genotypes increases, matrix Γ tends to matrix C and that, at the limit, Γ ¼ C; then, Eq. (5.39) can be written as RC ¼ kI ffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Γ</sup><sup>w</sup> <sup>p</sup> <sup>¼</sup> RG (except by LG); in addition, <sup>β</sup><sup>y</sup> <sup>¼</sup> <sup>0</sup> and <sup>β</sup><sup>G</sup> <sup>¼</sup> <sup>w</sup>, the weights of the LGSI, and, in this latter case, the CLGSI is equal to the LGSI, as we would expect. Thus, in the asymptotic context, the LGSI and the CLGSI are the same.

An additional interesting result of the relationship between the CLGSI and the LGSI is as follows. The maximized correlation between H and IC (or CLGSI accuracy) can be written as

$$\rho\_{Hl\_C} = \frac{\mathbf{a}\_C' \Psi\_C \mathfrak{P}\_C}{\sqrt{\mathbf{a}\_C' \Psi\_C \mathbf{a}\_C} \sqrt{\mathfrak{P}\_C' \mathbf{T}\_C \mathfrak{P}\_C}};\tag{5.41}$$

However, when <sup>Γ</sup> <sup>¼</sup> <sup>C</sup>, <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> Γ Γ Γ Γ , <sup>β</sup><sup>y</sup> <sup>¼</sup> <sup>0</sup>, <sup>β</sup><sup>G</sup> <sup>¼</sup> <sup>w</sup>, and β0 <sup>C</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> G ¼ 0 w<sup>0</sup> ½ , whence a<sup>0</sup> <sup>C</sup>ΨCβ<sup>C</sup> ¼ a<sup>0</sup> <sup>C</sup>ΨCa<sup>C</sup> ¼ β<sup>0</sup> <sup>C</sup>TCβ<sup>C</sup> ¼ w<sup>0</sup> Γw, and Eq. (5.41) is equal to 1. That is, the maximum correlation between H and IC in the asymptotic context is equal to the maximum correlation between H and the LGSI, and that value will be equal to 1.

The asymptotic relationship between the CLGSI expected genetic gain per trait, E<sup>C</sup> (Eq. 5.34), and the LGSI expected genetic gain per trait, EIG (Eq. 5.16), is as follows. When <sup>Γ</sup> <sup>¼</sup> <sup>C</sup>, <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> Γ Γ Γ Γ and <sup>β</sup><sup>0</sup> <sup>C</sup> ¼ 0 w<sup>0</sup> ½ , whence

$$\mathbf{E}\_C = k\_I \frac{\mathbf{\varPsi}\_C \mathbf{\varp}\_C}{\sqrt{\mathfrak{P}\_C^{\prime} \mathbf{T}\_C \mathfrak{P}\_C}} = k\_I \frac{2\mathbf{\varGamma}\mathbf{w}}{\sqrt{\mathbf{w}^\prime \mathbf{\varGamma}\mathbf{w}}} = 2\mathbf{E}\_{I\_G}.\tag{5.42}$$

This means that in the asymptotic context, the CLGSI expected genetic gain per trait is twice the LGSI expected genetic gain per trait. Of course, 2 is only a proportionality constant; thus, in reality, E<sup>C</sup> ¼ EIG .

#### 5.2.3 Statistical Properties of the CLGSI

Assume that H and IC have bivariate joint normal distribution; P, C, Γ, and w are known, and <sup>β</sup><sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> ΨCaC; then, the CLGSI properties are as follow:


Note that CLGSI properties 1 to 4 are the same as LMSI properties 1 to 4 and that both indices jointly incorporate phenotypic and marker information to predict the net genetic merit; however, the LMSI incorporates the marker information by the marker score values, whereas the CLGSI uses the GEBVs.

#### 5.2.4 Estimating the CLGSI Parameters

Using the real maize (Zea mays) F2 population with 248 genotypes (each with two repetitions), 233 molecular markers and three traits—GY (ton ha<sup>1</sup> ), EHT (cm), and PHT (cm)—described in Sect. 5.1.8 of this chapter, we estimated matrices P and C using Eqs. (2.22) to (2.24) described in Chap. 2 of this book. The estimated matrices were Pb ¼ 0:45 1:33 2:33 1:33 65:07 83:71 2:33 83:71 165:99 2 4 3 5 and Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 2 4 3 5.

In a similar manner, we estimated matrix Γ using Eqs. (5.21) to (5.23). The estimated matrix was Γb ¼ 0:07 0:65 1:05 0:65 10:62 14:25 1:05 14:25 26:37 2 4 3 5. Note that matrices Cb and Γb

have similar values. This means that, in the asymptotic context, we can assume that matrix Γ tends to matrix C.

To estimate the CLMSI and its associated parameters (selection response, expected genetic gain per trait, etc.), we need to estimate the vector of coefficients <sup>β</sup><sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> <sup>Ψ</sup>Ca<sup>C</sup> as <sup>β</sup>b<sup>C</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>C</sup> <sup>Ψ</sup><sup>b</sup> <sup>C</sup>aC, where <sup>T</sup>b<sup>C</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb and <sup>Ψ</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> <sup>C</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb are estimates of matrices <sup>T</sup><sup>C</sup> <sup>¼</sup> <sup>P</sup> <sup>Γ</sup> Γ Γ and <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ respectively. The estimated CLGSI vector of coefficients <sup>β</sup>b<sup>C</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>C</sup> Ψb <sup>C</sup>a<sup>C</sup> is conformed by the vector of phenotypic weights, <sup>β</sup>b<sup>y</sup> <sup>¼</sup> - <sup>P</sup><sup>b</sup> <sup>Γ</sup>b<sup>1</sup>- <sup>C</sup><sup>b</sup> <sup>Γ</sup>b w, and by the vector of genomic weights, <sup>β</sup>b<sup>G</sup> <sup>¼</sup> <sup>I</sup> - <sup>P</sup><sup>b</sup> <sup>Γ</sup>b<sup>1</sup>- <sup>C</sup><sup>b</sup> <sup>Γ</sup>b h iw.

Let w<sup>0</sup> ¼ ½ 5 0:1 0:1 be the vector of economic weights; then, according to the estimated matrices Pb, Cb, and Γb, βb<sup>0</sup> <sup>y</sup> ¼ ½ 0:08 0:02 0:01 and βb0 <sup>G</sup> ¼ ½ 4:92 0:08 0:09 , whence the estimated CLGSI in the training population can be written as

$$
\widehat{I}\_C = \widehat{\mathfrak{B}}\_\mathbf{y} \mathbf{y} + \widehat{\mathfrak{B}}\_G \widehat{\mathfrak{Y}}.\tag{5.43}
$$

Suppose a selection intensity of 10% (kI ¼ 1.755); then, the estimated CLGSI selection response and expected genetic gain per trait were Rb<sup>C</sup> ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>C</sup>TbCβb<sup>C</sup> q ¼ 1:54 and Eb<sup>0</sup> <sup>C</sup> ¼ kI βb0 <sup>C</sup>Ψ<sup>b</sup> <sup>C</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>C</sup>TbCβb<sup>C</sup> <sup>q</sup> <sup>¼</sup> ½ <sup>0</sup>:36 1:04 1:70 0:36 1:53 2:<sup>38</sup> respec-

tively, whereas the estimated CLGSI accuracy was <sup>b</sup>ρHIC <sup>¼</sup> <sup>σ</sup>bIC σbH ¼ 0:814.

The estimated LPSI selection response, expected genetic gain per trait, and accuracy were 0.601, 0½ :09 0:81 0:89 , and 0.32 respectively; thus, the CLGSI was more efficient to predict the net genetic merit than the LPSI because the CLGSI accuracy and selection response were 0.814 and 1.54 respectively.

#### 5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GW-LMSI and LPSI Efficiency

In this subsection, we compare the accuracy, selection response, and efficiency of the LGSI and CLGSI with the LMSI, the GW-LMSI, and the LPSI using the simulated data for a maize (Zea mays) population described in Chap. 2, Sect. 2.8.1.

Figure 5.3 presents the estimated accuracy values of the LMSI, the LGSI, the CLGSI, the LPSI, and the GW-LMSI for five simulated selection cycles. According to these results, for the first three selection cycles, the estimated accuracies of the indices, in decreasing order, were LMSI > LGSI > CLGSI > LPSI > GW-LMSI. That is, the highest estimated accuracy was obtained with the LMSI, whereas the lowest was obtained with the GW-LMSI. For the fourth and fifth selection cycles, the estimated accuracies, in decreasing order, were LMSI > LPSI > CLGSI > LGSI > GW-LMSI. This means that in all five selection cycles, the LMSI had the highest accuracy and the GW-LMSI had the lowest accuracy, whereas the estimated LGSI accuracy was reduced to fourth place. Thus, the accuracy of the LGSI tended to decrease after the first three selection cycles whereas LPSI accuracy was a constant.

To compare LGSI efficiency versus the efficiency of the other selection indices, we assumed that the interval between selection cycles in the LGSI is 1.5 years, whereas for CLGSI, LMSI, GW-LMSI, and LPSI, the interval was 4.0 years. Table 5.3 presents the estimated selection response of the LPSI, the LMSI, the

Fig. 5.3 Estimated accuracy values of the linear molecular selection index (LMSI), the LGSI, the combined LGSI (CLGSI), the LPSI, and the genome-wide LMSI (GW-LMSI) with the net genetic merit for four traits, 2500 markers, and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles

Table 5.3 Estimated selection response of the linear phenotypic selection index (LPSI), the linear molecular selection index (LMSI), the genome-wide LMSI (GW-LMSI), the linear genomic selection index (LGSI), and the combined LGSI (CLGSI), not including (first part of the Table) and including (second part of the Table) the interval length between selection cycles, obtained using five simulated selection cycles


a The interval length for the LPSI, LMSI, GW-LMSI, and C-LGSI was 4 years, whereas the interval length for the LGSI was 1.5 years


Table 5.4 Estimated accuracy of the LMSI, the LGSI, the CLGSI, the LPSI, and the GW-LMSI; LMSI efficiency compared with LGSI, CLGSI, LPSI, and GW-LMSI efficiencies, expressed in percentages, for five simulated selection cycles

GW-LMSI, the LGSI, and the CLGSI, including and not including the interval between selection cycles (first and second parts of Table 5.3 respectively), obtained using five simulated selection cycles. According to the first part of Table 5.3, the average estimated selection responses, in decreasing order, of the LMSI, CLGSI, LPSI, GW-LMSI, and LGSI for the five simulated selection cycles were 17.82, 15.30, 15.22, 13.24, and 13.11 respectively, when the length of the interval between selection was not included. If the length of the interval between selection cycles is included when comparing the selection response of the indices in terms of time, the estimated selection response of LMSI, CLGSI, LPSI, GW-LMSI must be divided by 4 in each selection cycle, and the estimated LGSI selection response should be divided by 1.5. Thus, according to the second part of Table 5.3, if we include the length of the interval between selection cycles, the average estimated selection responses, in decreasing order, of LGSI, LMSI, CLGSI, LPSI, and GW-LMSI for the five simulated selection cycles were 8.74, 4.46, 3.83, 3.80, and 3.31. This means that in terms of time, the efficiency of the LGSI was higher than the efficiency of the other four selection indices.

Table 5.4 presents the estimated accuracy of the LMSI, LGSI, CLGSI, LPSI, and the GW-LMSI. In addition, Table 5.4 presents the efficiency when predicting the net genetic merit of the LMSI with respect to the LGSI, CLGSI, LPSI, and GW-LMSI as percentages, for five simulated selection cycles. Note that in this case, LMSI efficiency was higher than the efficiency of the other four selection indices, because the LMSI had the highest correlation with the net genetic merit.

#### References

Beyene Y, Semagn K, Mugo S, Tarekegne A, Babu R et al (2015) Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci 55:154–163


Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T et al (2011) Genomic selection in plant breeding: knowledge and prospects. Adv Agron 110:77–123


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 6 Constrained Linear Genomic Selection Indices

Abstract The constrained linear genomic selection indices are null restricted and predetermined proportional gain linear genomic selection indices (RLGSI and PPG-LGSI respectively), which are a linear combination of genomic estimated breeding values (GEBVs) to predict the net genetic merit. They are the results of a direct application of the restricted and the predetermined proportional gain linear phenotypic selection index theory to the genomic selection context. The RLGSI can be extended to a combined RLGSI (CRLGSI) and the PPG-LGSI can be extended to a combined PPG-LGSI (CPPG-LGSI); the latter indices use phenotypic and GEBV information jointly in the prediction of net genetic merit. The main difference between the RLGSI and PPG-LGSI with respect to the CRLGSI and the CPPG-LGSI is that although the RLGSI and PPG-LGSI are useful in a testing population where there is only marker information, the CRLGSI and CPPG-LGSI can be used only in training populations when there are joint phenotypic and marker information. The RLGSI and CRLGSI allow restrictions equal to zero to be imposed on the expected genetic advance of some traits, whereas the PPG-LGSI and CPPG-LGSI allow predetermined proportional restriction values to be imposed on the expected trait genetic gains to make some traits change their mean values based on a predetermined level. We describe the foregoing four indices and we validated their theoretical results using real and simulated data.

#### 6.1 The Restricted Linear Genomic Selection Index

Let H ¼ w<sup>0</sup> g be the net genetic merit and I<sup>G</sup> ¼ β<sup>0</sup> γ the linear genomic selection index (LGSI, see Chap. 5 for details), where g, γ, w, and β are vectors t - 1 (t¼ number of traits) of breeding values, genomic breeding values, economic weights, and LGSI coefficients respectively. It can be shown that Cov(IG, <sup>g</sup>) <sup>¼</sup> Γβ is the covariance between g and I<sup>G</sup> ¼ β<sup>0</sup> γ, and that Var(γ) ¼ Γ is the genomic covariance matrix of size t t (see Chap. 5 for details). The objective of the restricted linear genomic selection index (RLGSI) is to improve only (t r) of t (r < t) traits (leaving r of them fixed) in a testing population using only genomic estimated breeding values (GEBVs). The RLGSI minimizes the mean squared difference between I<sup>G</sup> and H, E[(H IG) 2 ], with respect to β under the restriction Cov(IG, U<sup>0</sup> g) ¼ U<sup>0</sup> Γβ ¼ 0, where U<sup>0</sup> is a matrix (t 1) t of 1s and 0s, in a similar manner to the restricted linear phenotypic selection index (RLPSI) described in Chap. 3 in the phenotypic selection context.

#### 6.1.1 The Maximized RLGSI Parameters

Let Var(IG) ¼ β<sup>0</sup> Γβ be the variance of I<sup>G</sup> ¼ β<sup>0</sup> γ, w<sup>0</sup> Cw the variance of H ¼ w<sup>0</sup> g, and Cov(IG, H) ¼ w<sup>0</sup> Γβ the covariance between H ¼ w<sup>0</sup> g and I<sup>G</sup> ¼ β<sup>0</sup> γ. The mean squared difference between H and I<sup>G</sup> can be written as E[(H IG) 2 ], which should be minimized under the restriction U<sup>0</sup> Γβ ¼ 0 assuming that Γ, C, U<sup>0</sup> , and w are known, i.e., it is necessary to minimize the function

$$f\_R(\mathfrak{B}, \mathbf{v}) = \mathbf{w}' \mathbf{C} \mathbf{w} + \mathfrak{B}' \mathbf{T} \mathfrak{B} - 2 \mathbf{w}' \mathbf{T} \mathfrak{B} + 2 \mathbf{v}' \mathbf{U}' \mathbf{T} \mathfrak{B} \tag{6.1}$$

with respect to vectors β and v<sup>0</sup> ¼ [v<sup>1</sup> v<sup>2</sup> vr 1], where v is a vector of Lagrange multipliers. In matrix notation, the derivative results of Eq. (6.1) are

$$
\begin{bmatrix} \mathbf{f} \\ \mathbf{v} \end{bmatrix} = \begin{bmatrix} \Gamma & \Gamma \mathbf{U} \\ \mathbf{U}^\prime \Gamma & \mathbf{0} \end{bmatrix}^{-1} \begin{bmatrix} \Gamma \mathbf{w} \\ \mathbf{0} \end{bmatrix}. \tag{6.2}
$$

Following the procedure described in Chap. 3 (Eqs. 3.2 to 3.5), it can be shown that the RLGSI vector of coefficients that minimizes <sup>E</sup>[(<sup>H</sup> <sup>I</sup>G) 2 ] under the restriction U<sup>0</sup> Γβ ¼ 0 is

$$
\mathfrak{f}\_{RG} = \mathbf{K}\_G \mathbf{w},\tag{6.3}
$$

where K<sup>G</sup> ¼ [I<sup>t</sup> QG], Q<sup>G</sup> ¼ U(U<sup>0</sup> ΓU) 1 U0 Γ, w is a vector of economic weights, and I<sup>t</sup> is an identity matrix t t. When no restrictions are imposed on any of the traits, <sup>U</sup><sup>0</sup> is a null matrix and <sup>β</sup>RG <sup>¼</sup> <sup>w</sup>, the optimized LGSI vector of coefficients (see Chap. 5 for details).

By Eq. (6.3), the RLGSI, and the maximized RLGSI selection response and expected genetic gain per trait can be written as

$$I\_{\rm RG} = \mathfrak{P}\_{\rm RG}^{\prime} \mathfrak{p},\tag{6.4}$$

$$R\_{RG} = \frac{k\_I}{L\_G} \sqrt{\mathfrak{P}\_{RG}' \Gamma \mathfrak{P}\_{RG}} \tag{6.5}$$

and

$$\mathbf{E}\_{RG} = \frac{k\_I}{L\_G} \frac{\Gamma \mathfrak{P}\_{RG}}{\sqrt{\mathfrak{P}\_{RG}^{\prime} \Gamma \mathfrak{P}\_{RG}}},\tag{6.6}$$

respectively, where kI is the standardized selection differential (or selection intensity) associated with the RLGSI, and LG is the interval between selection cycles or the time required to complete a selection cycle using the RLGSI. Equations (6.4) to (6.6) depend only on GEBV information; thus, they are useful in testing populations.

#### 6.1.2 Statistical Properties of RLGSI

Assuming that H ¼ w<sup>0</sup> g and IRG ¼ β<sup>0</sup> RGγ have bivariate joint normal distribution, βRG ¼ KGw, and Γ, C, and w are known, it can be shown that the RLGSI has the following properties:


The statistical RLGSI properties are equal to the statistical RLPSI properties. Thus the RLGSI is an application of the RLPSI to the genomic selection context.

#### 6.1.3 Numerical Examples

To estimate the parameters associated with the RLGSI, we use the real data set described in Chap. 5, Sect. 5.1.8, where we found that, in the testing population, the

$$\text{estimate of matrix } \Gamma \text{ was } \widehat{\Gamma} = \begin{bmatrix} 0.21 & 2.95 & 5.00 \\ 2.95 & 42.41 & 71.11 \\ 5.00 & 71.11 & 121.53 \end{bmatrix}. \text{ We use this matrix and the}$$

GEBVs associated with the traits grain yield (GY, ton ha<sup>1</sup> ), ear height (EHT, cm), and plant height (PHT, cm) to illustrate the RLGSI theoretical results.

Suppose that on the RLGSI expected genetic gain per trait we impose one and two null restrictions using matrices U<sup>0</sup> <sup>1</sup> ¼ ½ 100 and U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> (see Chap. 3, Sect. 3.1.3, for details about matrix U<sup>0</sup> ). We need to estimate the RLGSI vector of coefficients (βRG <sup>¼</sup> <sup>K</sup>Gw) as <sup>β</sup>bRG <sup>¼</sup> <sup>K</sup><sup>b</sup> <sup>G</sup>w, where <sup>K</sup><sup>b</sup> <sup>G</sup> <sup>¼</sup> I<sup>3</sup> Qb <sup>G</sup> and Qb <sup>G</sup> ¼ U U0 ΓbU 1 U0 Γb are estimates of matrices K<sup>G</sup> ¼ [I<sup>3</sup> QG] and Q<sup>G</sup> ¼ U (U<sup>0</sup> ΓU) 1 U0 Γ respectively, and I<sup>3</sup> is an identity matrix 3 -3. The estimated Q<sup>G</sup>

$$\text{In matrices for restrictions } \mathbf{U}\_1' = [1 \ 0 \ 0] \text{ and } \mathbf{U}\_2' = \begin{bmatrix} 1 \ 0 \ 0 \\ 0 \ 1 \ 0 \end{bmatrix} \text{ were } \widehat{\mathbf{Q}}\_{G\_1} = \mathbf{U}\_1 \left( \mathbf{U}\_1' \widehat{\mathbf{F}} \mathbf{U}\_1 \right)^{-1}$$

$$\mathbf{U}\_1' \hat{\mathbf{T}} = \begin{bmatrix} 1.0 & 14.05 & 23.81 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \text{ and } \hat{\mathbf{Q}}\_{G\_2} = \mathbf{U}\_2 \left( \mathbf{U}\_2' \hat{\mathbf{T}} \mathbf{U}\_2 \right)^{-1} \mathbf{U}\_2' \hat{\mathbf{T}} = \begin{bmatrix} 1.0 & 0 & 11.18 \\ 0 & 1.0 & 0.90 \\ 0 & 0 & 0 \end{bmatrix} \text{ respectively}$$

tively, whereas the estimated <sup>K</sup><sup>G</sup> matrices for both restrictions were <sup>K</sup><sup>b</sup> <sup>G</sup><sup>1</sup> <sup>¼</sup> I3Qb <sup>G</sup><sup>1</sup> 

$$\mathbf{H} = \begin{bmatrix} 0 & -14.05 & -23.81 \\ 0 & 1.0 & 0 \\ 0 & 0 & 1.0 \end{bmatrix} \text{ and } \widehat{\mathbf{K}}\_{G\_2} = \begin{bmatrix} \mathbf{I}\_3 - \widehat{\mathbf{Q}}\_{G\_2} \end{bmatrix} = \begin{bmatrix} 0 \ 0 \ -11.18 \\ 0 \ 0 \ -0.90 \\ 0 \ 0 \ 1.0 \end{bmatrix}.$$

Let w<sup>0</sup> ¼ ½ 5 0:1 0:1 be the vector of economic weights; then the estimated RLGSI vector of coefficients for one and two null restrictions were <sup>β</sup>b<sup>0</sup> RG<sup>1</sup> ¼ w<sup>0</sup> Kb0 <sup>G</sup><sup>1</sup> ¼ ½ 3:78 0:1 0:1 and βb<sup>0</sup> RG<sup>2</sup> ¼ w<sup>0</sup> Kb0 <sup>G</sup><sup>2</sup> ¼ ½ 1:12 0:09 0:1 respectively, and the estimated RLGSI for both restrictions can be written as <sup>b</sup>IRG<sup>1</sup> <sup>¼</sup> <sup>3</sup>:78GEBV1 <sup>0</sup>:<sup>1</sup> GEBV2 <sup>0</sup>:1GEBV3 and <sup>b</sup>IRG<sup>2</sup> <sup>¼</sup> <sup>1</sup>:12GEBV1 <sup>þ</sup> <sup>0</sup>:09GEBV2 <sup>0</sup>:1GEBV3, where GEBV1, GEBV2, and GEBV3 are the genomic estimated breeding values associated with traits GY, EHT, and PHT respectively in the testing population.

Table 6.1 presents 20 genotypes selected from a population of 380 genotypes and the GEBVs in the testing population ranked according to the estimated RLGSI values for one restriction, where U<sup>0</sup> <sup>1</sup> ¼ ½ 100 . The estimated RLGSI values for genotypes 5 and 306 can be obtained as follows:bI RG<sup>5</sup> ¼ 3:78ð Þ 0:6 0:1 ð Þ 8:67 0:1 15 ð Þ¼ :97 0:196 and bI RG<sup>306</sup> ¼ 3:78 0ð Þ :13 0:1 1ð Þ :31 0:1 1ð Þ¼ :66 0:194 respectively. This procedure is valid for any number of genotypes and GEBVs in the testing population.

Assume a selection intensity of 10% (kIG ¼ 1:755); then the estimated RLGSI selection response and expected genetic gain per trait not including the interval length were RbRG<sup>1</sup> ¼ kIG ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RG<sup>1</sup> ΓbβbRG<sup>1</sup> q ¼ 0:40 and Eb<sup>0</sup> RG<sup>1</sup> ¼ kI βb0 RG<sup>1</sup> <sup>Γ</sup><sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RG<sup>1</sup> ΓbβbRG<sup>1</sup> q

Table 6.1 Number of genotypes selected from 380 genotypes of a real testing population; genomic estimated breeding values (GEBVs) associated with three traits: grain yield (GY, ton ha<sup>1</sup> ), ear height (EHT, cm), and plant height (PHT, cm) in the testing population, and estimated and ranked restricted linear genomic selection index (RLGSI) values obtained in the testing population for one null restriction


¼ ½ 0 1:42 2:58 respectively. For two restrictions, with U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> , the estimated RLGSI selection response and expected genetic gains not including the interval length were RbRG<sup>2</sup> ¼ kIG ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RG<sup>2</sup> ΓbβbRG<sup>2</sup> q ¼ 0:23 and Eb0 RG<sup>2</sup> ¼ kI βb0 RG<sup>2</sup> <sup>Γ</sup><sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RG<sup>2</sup> ΓbβbRG<sup>2</sup> <sup>q</sup> <sup>¼</sup> ½ 0 0 2:<sup>29</sup> respectively. When the number of

restrictions increases, the estimated RLGSI selection response value decreases, whereas the number of zeros increases in the estimated RLGSI expected genetic gain per trait. The number of zeros in the estimated RLGSI expected genetic gain per trait is equal to the number of restrictions imposed on RLGSI by matrix U<sup>0</sup> , where each restriction appears as 1.

Figure 6.1 presents the frequency distribution of the estimated RLGSI values for one (Fig. 6.1a) and two null restrictions (Fig. 6.1b). For both restrictions the frequency distribution of the estimated RLGSI values approaches the normal distribution.

Fig. 6.1 Distribution of 380 estimated restricted linear genomic selection index (RLGSI) values with one (a) and two (b) null restrictions respectively obtained in a real testing population for one selection cycle in one environment

Now we use the simulated data set described in Chap. 2, Sect. 2.8.1, to compare RLPSI (restricted linear phenotypic selection index, Chap. 3 for details) efficiency versus RLGSI efficiency. Table 6.2 presents the estimated RLPSI and RLGSI selection response for one, two, and three null restrictions imposed by matrices

$$\mathbf{U}\_1' = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}, \mathbf{U}\_2' = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}, \text{and } \mathbf{U}\_3' = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \text{ for five simulated}$$

selection cycles including and not including the interval between selection cycles. In each selection cycle, the sample size was equal to 500 genotypes, each with four repetitions and four traits, whereas the selection intensity was 10% (kI ¼ 1.755); the interval lengths for the RLPSI and RLGSI were 4 and 1.5 years (Beyene et al. 2015) respectively.

Table 6.2 was divided in two parts. The first part presents the estimated RLPSI whereas the second part presents the estimated RLGSI selection responses. Columns 2, 3, and 4 in Table 6.2 present the estimated RLPSI and RLGSI selection responses not including the interval length, whereas columns 5, 6, and 7 present the estimated RLPSI and RLGSI selection response, including the interval length. The averages of the estimated RLPSI selection response not including the interval length for one, two, and three restrictions were 7.04, 5.50, and 3.90, whereas when the interval length was included, the averages were 1.76, 1.38, and 0.98 respectively. The averages of the estimated RLGSI selection response not including the interval length



a The estimated RLPSI selection response was divided by 4

b The estimated RLGSI selection response was divided by 1.5

for one, two, and three restrictions were 5.04, 3.72, and 2.79, whereas when the interval length was included the averages were 3.36, 2.48, and 1.86 respectively. These results indicated that when the interval length was included in the estimation of the RLPSI and RLGSI selection response, RLGSI efficiency was greater than RLPSI efficiency, and vice versa, when the interval length was not included the RLPSI efficiency was greater than RLGSI efficiency.

Table 6.3 presents the estimated RLPSI (first part) and RLGSI (second part) expected genetic gain per trait not including the interval between selection cycles for one, two, and three null restrictions in five simulated selection cycles. In this case, RLPSI efficiency is greater than RLGSI efficiency because the averages of the estimated RLPSI expected genetic gain per trait were 2.52, 2.26, and 2.26 for one null restriction; 2.84 and 2.65 for two null restrictions; and 3.90 for three null restrictions. For the same set of restrictions, the averages of the estimated RLGSI expected genetic gain per trait were: 1.85, 1.13, and 2.06 for one null restriction; 1.52 and 2.19 for two null restrictions, and 2.79 for three null restrictions. However, divided by the interval length (4 years in the RLPSI), the averages of the estimated RLPSI expected genetic gain per trait were 0.63, 0.57, and 0.57 for one null restriction; 0.71 and 0.66 for two null restrictions, and 0.98 for three null restrictions. In a similar manner, dividing by the interval length (1.5 years in this case), the averages of the estimated RLGSI expected genetic gain per trait were 1.23, 0.75,


Table 6.3 Estimated RLPSI and RLGSI expected genetic gain per trait for 1, 2, and 3 null restrictions for 5 simulated selection cycles (each with 4 traits) not including the interval length between selection cycles

and 1.37 for one restriction; 1.01 and 1.46 for two restrictions; and 1.86 for three restrictions.

Table 6.4 presents the estimated RLPSI heritability (bh<sup>2</sup> IR ) values, the estimated restricted linear genomic selection index (RLGSI) accuracy (bρHIRG) values, the values of <sup>W</sup> <sup>¼</sup> <sup>b</sup>ρHIRG bhIR LRP (LRP <sup>¼</sup> 4), and the values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλ<sup>R</sup> 1 , where <sup>b</sup>λ<sup>R</sup> <sup>¼</sup> <sup>b</sup>ρHIR <sup>=</sup> <sup>b</sup>ρHIRG and <sup>b</sup>ρHIR is the estimated RLPSI accuracy, for one, two, and three restrictions for five simulated selection cycles. The RLGSI interval length was LRG <sup>¼</sup> 1.5 whereas the averages of the values of <sup>W</sup> <sup>¼</sup> <sup>b</sup>ρHIRG bhIR LRP for each restriction were 1.22, 0.85, and 0.60; this means that the estimated Technow inequality (Technow et al. 2013), LRG <sup>&</sup>lt; <sup>b</sup>ρHIRG bhIR LRP (Chap. 5, Eq. 5.18), was not true. Thus, according to the Technow inequality results, for this data set, RLGSI efficiency in terms of time was not greater than RLPSI efficiency. The inequality LRG <sup>&</sup>lt; <sup>b</sup>ρHIG bhIR LIR was not true because the estimated RLGSI accuracy was very low, whereas RLPSI heritability was high. Thus, note that the averages of the estimated RLGSI accuracy for one, two, and

three null restrictions were 0.25, 0.19, and 0.14 respectively, and the averages of the estimated RLPSI heritability values were 0.70, 0.78 and 0.88, respectively. Thus, according to these results, because the estimated RLGSI accuracy is very low and


 1

,

RLPSI heritability is high, RLGSI efficiency was lower than RLPSI efficiency in terms of time.

The last three columns of Table 6.4, from left to right, present the estimated <sup>p</sup> values, <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλ<sup>R</sup> 1 , for one, two, and three null restrictions in five simulated selection cycles. The average of the <sup>b</sup><sup>p</sup> values indicates that for each of the three restrictions the RLPSI efficiency was 65.05%, 78.73%, and 74.09%, greater than RLGSI efficiency at predicting the net genetic merit. Thus, for this data set, the RLPSI was a better predictor of the net genetic merit than the RLGSI in each cycle.

#### 6.2 The Predetermined Proportional Gain Linear Genomic Selection Index

#### 6.2.1 Objective of the PPG-LGSI

Let d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> ... dr ½ be a vector 1 r (r is the number of predetermined proportional gains) of the predetermined proportional gains imposed by the breeder, and assume that μ<sup>q</sup> is the population mean of the qth trait before selection. The objective of the predetermined proportional gain linear genomic selection index (PPG-LGSI) is to change μ<sup>q</sup> to μ<sup>q</sup> + dq in the testing population, where dq is a predetermined change in μq. It is possible to solve this problem minimizing the mean squared difference between IG ¼ β<sup>0</sup> γ and H ¼ w<sup>0</sup> g, E[(H IG) 2 ], under the restriction U<sup>0</sup> Γβ ¼ θGd, where θ<sup>G</sup> is a proportionality constant, or under the 2 3

restriction D<sup>0</sup> U0 Γβ ¼ 0, where D<sup>0</sup> ¼ dr 0 ... 0 d<sup>1</sup> 0 dr ... 0 d<sup>2</sup> ⋮⋮⋱⋮⋮ 0 0 ... dr dr<sup>1</sup> 6 6 4 7 7 5 is a matrix

(r 1) <sup>r</sup> (see Chap. <sup>3</sup> for details), and dq (<sup>q</sup> <sup>¼</sup> 1, 2..., <sup>r</sup>) is the <sup>q</sup>th element of vector d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> ... dr ½ ; U<sup>0</sup> is a matrix (t 1) t of 1s and 0s, and Γ ¼ σγqq<sup>0</sup> n o (q, q<sup>0</sup> ¼ 1, 2, ..., t, t ¼ number of traits) is a covariance matrix of additive genomic breeding values, γ<sup>0</sup> ¼ [γ<sup>1</sup> γ2...γt].

#### 6.2.2 The Maximized PPG-LGSI Parameters

In this subsection, we minimize E[(H IG) 2 ] under the restriction D<sup>0</sup> U0 Γβ ¼ 0 and later under the restriction U<sup>0</sup> Γb ¼ θGd. Under the restriction D<sup>0</sup> U0 Γβ ¼ 0, it is necessary to minimize the function

$$f\_P(\mathfrak{P}, \mathbf{v}) = \mathfrak{P}'\mathbf{\Gamma}\mathfrak{P} + \mathbf{w'}\mathbf{Cw} - 2\mathbf{w'}\mathbf{\Gamma}\mathfrak{P} + 2\mathbf{v'}\mathbf{D'}\mathbf{U'}\mathbf{\Gamma}\mathfrak{P} \tag{6.7}$$

with respect to β and v<sup>0</sup> ¼ ½ v<sup>1</sup> v<sup>2</sup> ... vr<sup>1</sup> , where v<sup>0</sup> is a vector of Lagrange multipliers. From a mathematical point of view, Eq. (6.7) is equal to Eq. (6.1); thus, the vector of coefficients β of the PPG-LGSI should be similar to the vector of coefficients of the RLGSI (Eq. 6.3), i.e., the PPG-LGSI vector of coefficients is equal to

$$
\mathfrak{F}\_{PG} = \mathbf{K}\_P \mathbf{w},\tag{6.8}
$$

where now K<sup>P</sup> ¼ [I<sup>t</sup> QP], Q<sup>P</sup> ¼ UD(D<sup>0</sup> U0 ΓUD) 1 D0 U0 Γ, w is a vector of economic weights, and I<sup>t</sup> is an identity matrix t t. When D<sup>0</sup> ¼ U<sup>0</sup> , βPG ¼ βRG (the RLGSI vector of coefficients), and when <sup>U</sup><sup>0</sup> is a null matrix, <sup>β</sup>PG <sup>¼</sup> <sup>w</sup> (the LGSI vector of coefficients). This means that the PPG-LGSI includes the RLGSI and the LGSI as particular cases.

Under the restriction U<sup>0</sup> Γβ <sup>¼</sup> <sup>θ</sup>G<sup>d</sup> (see Chap. <sup>3</sup> for details) the vector of coefficients of the PPG-LGSI can be written as

$$\mathfrak{g}\_{PG} = \mathfrak{g}\_{RG} + \theta\_G \mathbf{U} (\mathbf{U}^\prime \mathbf{T} \mathbf{U})^{-1} \mathbf{d},\tag{6.9}$$

where βRG ¼ KGw (Eq. 6.3), K<sup>G</sup> ¼ [I QG], Q<sup>G</sup> ¼ U(U<sup>0</sup> ΓU) 1 U0 Γ, and d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> ... dr ½ is the vector of the predetermined proportional gains imposed by the breeder. It can be shown that θG, the proportionality constant, can be written as

$$\theta\_G = \frac{\mathbf{d}'(\mathbf{U}'\mathbf{\Gamma}\mathbf{U})^{-1}\mathbf{U}'\mathbf{\Gamma}\mathbf{w}}{\mathbf{d}'(\mathbf{U}'\mathbf{\Gamma}\mathbf{U})^{-1}\mathbf{d}}.\tag{6.10}$$

When θ<sup>G</sup> ¼ 0, βPG ¼ βRG, and when U<sup>0</sup> is a null matrix, βPG ¼ w. Equations (6.8) and (6.9) give the same results, that is, both equations express the same result in a different mathematical way.

The maximized selection response and expected genetic gain per trait of the PPG-LGSI can be written as

$$R\_{PG} = \frac{k\_I}{L\_G} \sqrt{\mathfrak{P}\_{PG}' \Gamma \mathfrak{P}\_{PG}} \tag{6.11}$$

and

$$\mathbf{E\_{PG}} = \frac{k\_I}{L\_G} \frac{\Gamma \mathfrak{P}\_{PG}}{\sqrt{\mathfrak{P}\_{PG}^{\prime} \Gamma \mathfrak{P}\_{PG}}},\tag{6.12}$$

respectively, where LG is the time required to complete a selection cycle using the PPG-LGSI. Equations (6.11) and (6.12) depend only on GEBV information.

#### 6.2.3 Statistical Properties of the PPG-LGSI

Assuming that H ¼ w<sup>0</sup> g and the PPG-LGSI (IPG ¼ β<sup>0</sup> PGγ) have bivariate joint normal distribution, βPG ¼ KPw; Γ, C, and w are known, it can be shown that PPG-LGSI has the following statistical properties:


$$\text{4. The variance of the predicted error, } Var(H - I\_{PG}) = \left(1 - \rho\_{H \text{I} \text{vG}}^2\right) \sigma\_H^2 \text{, is minimal.}$$

The statistical PPG-LGSI properties are equal to the statistical PPG-LPSI properties, then, the PPG-LGSI is an application of the PPG-LPSI to the genomic selection context.

#### 6.2.4 Numerical Example

To illustrate the PPG-LGSI theory, we use the estimated matrix Γb ¼ 0:21 2:95 5:00 2:95 42:41 71:11 5:00 71:11 121:53 2 4 3 5 and the GEBVs associated with the traits GY (ton

ha<sup>1</sup> ), EHT (cm), and PHT (cm), described in Sect. 6.1.3.

It is necessary to estimate the PPG-LGSI vector of coefficients <sup>β</sup>PG <sup>¼</sup> <sup>β</sup>RG <sup>+</sup> <sup>θ</sup>g<sup>U</sup> (U<sup>0</sup> ΓU) 1 d (Eqs. 6.9 and 6.10). In Sect. 6.1.3, we showed that the estimated vectors of coefficients of <sup>β</sup>RG <sup>¼</sup> <sup>K</sup>G<sup>w</sup> for the null restrictions <sup>U</sup><sup>0</sup> <sup>1</sup> ¼ ½ 100 and U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> were <sup>β</sup>b<sup>0</sup> RG<sup>1</sup> ¼ w<sup>0</sup> Kb0 <sup>G</sup><sup>1</sup> ¼ ½ 3:78 0:1 0:1 and βb<sup>0</sup> RG<sup>2</sup> ¼ w<sup>0</sup> Kb0 <sup>G</sup><sup>2</sup> ¼ ½ 1:12 0:09 0:1 respectively, where w<sup>0</sup> ¼ ½ 5 0:1 0:1 . This means that to estimate βPG ¼ βRG + θGU(U<sup>0</sup> ΓU) 1 d, we need only to estimate θGU(U<sup>0</sup> ΓU) 1 d for both sets of restrictions.

Consider matrix U<sup>0</sup> <sup>1</sup> ¼ ½ 100 and let d<sup>1</sup> ¼ 7.0 be the predetermined proportional gain restriction for trait 1. We can estimate θ<sup>G</sup> and U(U<sup>0</sup> ΓU) 1 d as <sup>b</sup>θ<sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>7</sup>:<sup>0</sup> U0 <sup>1</sup>ΓbU<sup>1</sup> 1 U0 <sup>1</sup>Γbw 7:0 U0 <sup>1</sup>ΓbU<sup>1</sup> 1 7:0 ¼ 0:036 and U<sup>1</sup> U0 <sup>1</sup>ΓbU<sup>1</sup> 1 7:0 ¼ 33:333 0 0 2 4 3 5, whence the PPG-LGSI vector of coefficients was βbPG<sup>1</sup> ¼ βbRG<sup>1</sup> þ bθ<sup>G</sup>1U<sup>1</sup> U0 <sup>1</sup>ΓbU<sup>1</sup> 1 7:0 ¼ 5:0 0:1 0:1 2 4 3 5, and the estimated PPG-LGSI was bI PG<sup>1</sup> ¼ 5:0GEBV1 0:1GEBV2 0:1GEBV3. In a similar manner, we can estimate the PPG-LGSI vector of coefficients under restrictions U0 <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> and <sup>d</sup><sup>0</sup> <sup>2</sup> ¼ ½ 7 3 . In this case, βbPG<sup>2</sup> ¼ βbRG<sup>2</sup> þ bθG2U<sup>2</sup> U0 <sup>2</sup>ΓbU<sup>2</sup> 1 d<sup>2</sup> ¼ 4:97 0:18 0:10 2 4 3 5 and the estimated PPG-LGSI was bI PG<sup>2</sup> ¼ 4:97GEBV1 0:18GEBV2 0:1GEBV3.

Figure 6.2 presents the frequency distribution of the estimated PPG-LGSI values for one (Fig. 6.2a) and two (Fig. 6.2b) predetermined restrictions, d ¼ 7 and d<sup>0</sup> ¼ ½ 7 3 respectively, obtained in a real testing population for one selection cycle in one environment. For both restrictions, the frequency distribution of the estimated PPG-LGSI values approaches the normal distribution.

Assume a selection intensity of 10% (kIG ¼ 1:755 ); then, for one predetermined restriction, where U<sup>0</sup> <sup>1</sup> ¼ ½ 100 and d<sup>1</sup> ¼ 7.0, the estimated PPG-LGSI selection response and expected genetic gain per trait, not including the interval length, were RbPG<sup>1</sup> ¼kIG ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 PG<sup>1</sup> ΓbβbPG<sup>1</sup> q ¼1:05 and Eb<sup>0</sup> PG<sup>1</sup> ¼kI βb0 PG<sup>1</sup> <sup>Γ</sup><sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 ΓbβbPG<sup>1</sup> <sup>q</sup> ¼½ <sup>0</sup>:74 9:92 16:<sup>54</sup>

PG<sup>1</sup>

Fig. 6.2 Distribution of 380 estimated predetermined proportional gain linear genomic selection index (PPG-LGSI) values with one (a) and two (b) predetermined restrictions, d ¼ 7 and d<sup>0</sup> ¼ ½ 7 3 respectively, obtained in a real testing population for one selection cycle in one environment

respectively. For two restrictions, with U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> and <sup>d</sup><sup>0</sup> ¼½ 7 3 , the estimated RLGSI selection response and expected genetic gains, not including the interval length, were RbPG<sup>2</sup> ¼kIG ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 PG<sup>2</sup> Γbβb<sup>G</sup><sup>2</sup> q ¼0:52 and Eb<sup>0</sup> PG<sup>2</sup> ¼kI βb0 PG<sup>2</sup> <sup>Γ</sup><sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 PG<sup>2</sup> ΓbβbPG<sup>2</sup> q ¼

½ 0:11 0:05 0:14 respectively.

Now, we use the simulated data set described in Chap. 2, Sect. 2.8.1 to compare PPG-LGSI efficiency versus predetermined proportional gain linear phenotypic selection index (PPG-LPSI) efficiency. Let U<sup>0</sup> <sup>1</sup> ¼ ½ 100 , U<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>100</sup> <sup>010</sup> , 1000 2 3

and U<sup>0</sup> <sup>3</sup> ¼ 0100 0010 4 5 be the matrices and d<sup>1</sup> ¼ 7, d<sup>0</sup> <sup>2</sup> ¼ ½ 7 3 , and d<sup>0</sup> <sup>3</sup> ¼

½ 7 3 5 the vectors for one, two, and three predetermined restrictions respectively. Table 6.5 presents the estimated PPG-LPSI and PPG-LGSI selection response for each predetermined restriction in five simulated selection cycles including and not including the interval between selection cycles (4 years for the PPG-LPSI and 1.5 years for the PPG-LGSI); estimated PPG-LPSI and PPG-LGSI accuracy; and estimated variance of the predicted error (VPE). In each selection cycle, the sample size was equal to 500 genotypes, each with four repetitions and four traits. The selection intensity was 10% (kI ¼ 1.755).

The averages of the estimated PPG-LPSI selection response not including the interval length were 15.14, 14.87, and 13.30, whereas when the interval length was included, the average selection responses were 3.79, 3.72, and 3.33, for one, two, and three predetermined restrictions respectively (Table 6.5). The averages of the estimated PPG-LGSI selection responses not including the interval length for one, two, and three predetermined restrictions were 14.48, 13.47, and 11.26 respectively, and when the interval length was included, the selection responses were 9.65, 8.98, and 7.51 respectively (Table 6.5). These results indicate that when the interval length was included in the estimation of the PPG-LPSI and PPG-LGSI selection responses, PPG-LGSI efficiency was greater than PPG-LPSI efficiency, and vice versa, when the interval length was not included in the PPG-LPSI and PPG-LGSI selection responses, PPG-LPSI efficiency was higher than PPG-LGSI efficiency.

The averages of the estimated VPE values of the PPG-LPSI for one, two, and three predetermined restrictions were 22.42, 30.56, and 41.17 respectively, whereas the estimated VPE values of the PPG-LGSI (see Sect. 6.2.3 for details) were 59.80, 66.95, and 83.98, respectively, that is, in all selection cycles, the VPE of the PPG-LPSI was lower than that of the PPG-LGSI. This means that for this data set, the PPG-LPSI was a better predictor of the net genetic merit than the PPG-LGSI. These results can be explained by observing that the averages of the estimated PPG-LPSI accuracies were 0.88, 0.86, and 0.77, whereas the estimated PPG-LGSI accuracies were 0.65, 0.68, and 0.57 for each predetermined restriction, that is, the estimated PPG-LGSI accuracies were lower than the estimated PPG-LPSI accuracies for this data set.


aThe estimated LPSI selection response was divided by 4 and the estimated LGS selection response was divided by 1.5

Table 6.5 Estimated predetermined proportional gain linear phenotypic and genomic selection index (PPG-LPSI and PPG-LGSI respectively) selectionresponses for 1, 2 and 3 predetermined restrictions for five simulated selection cycles including and not including the interval between selection cycles (4 years

Table 6.6 Estimated PPG-LPSI heritability (bh<sup>2</sup> <sup>P</sup>), values of WP <sup>¼</sup> <sup>b</sup>ρHIG h bP LP (LP ¼ 4), and the ratio of the estimated PPG-LPSI accuracy (ρbHIP ) to the estimated PPG-LGSI accuracy (ρbHIPG ): <sup>b</sup>λ<sup>P</sup> <sup>¼</sup> <sup>b</sup>ρHIP <sup>=</sup>ρbHIPG , and values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλ<sup>p</sup> 1 for 1, 2 and 3 predetermined restrictions for five simulated selection cycles


Table 6.6 presents the estimated predetermined PPG-LPSI heritability (bh<sup>2</sup> <sup>P</sup>) values, WP <sup>¼</sup> <sup>b</sup>ρHIG bhP LP (LP <sup>¼</sup> 4) values, and ratio of the estimated PPG-LPSI accuracy (bρHIP) to the estimated PPG-LGSI accuracy (bρHIPG ), i.e., <sup>b</sup>λ<sup>P</sup> <sup>¼</sup> <sup>b</sup>pHIP <sup>=</sup>bpHIPG , and, finally, values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλ<sup>P</sup> 1 for one, two, and three null restrictions for five simulated selection cycles.

The averages of the WP values for one, two, and three null restrictions were 3.29, 3.12, and 2.53, respectively, whereas the PPG-LGSI interval length was 1.5 (LG <sup>¼</sup> 1.5). This means that the estimated Technow inequality, LG <sup>&</sup>lt; <sup>b</sup>ρHIG bhP LP (see Chap. 5, Eq. 5.18) was true. Thus, PPG-LGSI efficiency in terms of time was greater than PPG-LPSI efficiency for this data set. These results coincide with those obtained earlier in this chapter, when we compared PPG-LGSI efficiency versus PPG-LPSI efficiency in terms of interval length. However, the average values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλ<sup>P</sup> 1 (see Chap. 5, Eq. 5.15) were, in percentage terms, 16.80%,

20.76%, and 25.85% for each restriction. These latter results indicate that for this data set, the PPG-LPSI was a better predictor of the net genetic merit than the PPG-LGSI. This is because the estimated PPG-LPSI accuracies were higher than the estimated PPG-LPSI accuracies for this data set. We found similar results when we compared the PPG-LPSI VPE versus PPG-LGSI VPE (Table 6.5).

#### 6.3 The Combined Restricted Linear Genomic Selection Index

The combined restricted linear genomic selection index (CRLGSI) is based on the RLPSI (Chap. 3) and combined linear genomic selection index (CLGSI, Chap. 5) theory. In the RLPSI, the breeder's objective is to improve only (t r) of t (r < t) traits, leaving r of them fixed; the same is true for the CRLGSI, but in the latter case, it is necessary to impose 2r restrictions, i.e., we need to fix r traits and their associated r GEBVs to obtain results similar to those obtained with the RLPSI. This is the main difference between the CRLGSI and the RLPSI.

It can be shown that Cov(IC, aC) ¼ ΨCβ<sup>C</sup> is the covariance between the breeding value vector (a<sup>0</sup> <sup>C</sup> ¼ g<sup>0</sup> γ<sup>0</sup> ½ ) and the CLGSI, IC ¼ β<sup>0</sup> <sup>C</sup>t<sup>C</sup> (see Chap. 5 for details), where t 0 <sup>C</sup> ¼ y<sup>0</sup> γ<sup>0</sup> ½ . In the CRLGSI, we want some covariances between the linear combinations of a<sup>C</sup> (U<sup>0</sup> <sup>C</sup>a<sup>C</sup> ) and CLGSI to be zero, i.e., Cov IC; U<sup>0</sup> <sup>C</sup>a<sup>C</sup> ¼ U<sup>0</sup> <sup>C</sup>ΨCβ<sup>C</sup> ¼ 0, where U<sup>0</sup> <sup>C</sup> is a matrix 2(t 1) - 2t of 1s and 0s (1 indicates that the trait and its associated GEBV are restricted, and 0 that the trait and its GEBV have no restrictions) and <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ is a block covariance matrix of a<sup>0</sup> <sup>C</sup> ¼ g<sup>0</sup> γ<sup>0</sup> ½ where C and Γ are the covariance matrices of breeding (g) and genomic (γ) values respectively. This problem can be solved by minimizing the mean squared difference between the CLGSI and H (E[(H IC) 2 ]) under the restriction U<sup>0</sup> <sup>C</sup>ΨCβ<sup>C</sup> ¼ 0 similar to the RLGSI in Sect. 6.1.

#### 6.3.1 The Maximized CRLGSI Parameters

Let <sup>T</sup><sup>C</sup> <sup>¼</sup> <sup>P</sup> <sup>Γ</sup> Γ Γ be the block covariance matrix of <sup>t</sup> 0 <sup>C</sup> ¼ y<sup>0</sup> γ<sup>0</sup> ½ where P and Γ are the covariance matrices of phenotypic (y) and genomic (γ) values respectively. Based on the Eq. (6.1) result, it can be shown that the CRLGSI vector of coefficients that minimizes E[(H IC) 2 ] under the restriction U<sup>0</sup> <sup>C</sup>ΨCβ<sup>C</sup> ¼ 0 is

$$
\mathfrak{f}\_{CR} = \mathbf{K}\_C \mathfrak{f}\_C,\tag{6.13}
$$

where <sup>K</sup><sup>C</sup> <sup>¼</sup> [<sup>I</sup> <sup>Q</sup>C], <sup>Q</sup><sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> Φ<sup>C</sup> Φ<sup>0</sup> CT<sup>1</sup> <sup>C</sup> Φ<sup>C</sup> 1 Φ0 <sup>C</sup>, Φ<sup>C</sup> ¼ U<sup>0</sup> <sup>C</sup>ΨC, and β<sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> <sup>Ψ</sup>Ca<sup>C</sup> (the vector of coefficients of the CLGSI, see Chap. <sup>5</sup> for details); T<sup>1</sup> <sup>C</sup> is the inverse of matrix TC, and I is an identity matrix 2t - 2t. When no restrictions are imposed on any of the traits, U<sup>0</sup> <sup>C</sup> is a null matrix and βCR ¼ β<sup>C</sup> (the vector of coefficients of the CLGSI). That is, the CRLGSI is more general than the CLGSI. Similar to the RLPSI and the RLGSI, matrices K<sup>C</sup> and Q<sup>C</sup> are idempotent (K<sup>C</sup> <sup>¼</sup> <sup>K</sup><sup>2</sup> <sup>C</sup> and <sup>Q</sup><sup>C</sup> <sup>¼</sup> <sup>Q</sup><sup>2</sup> <sup>C</sup>) and orthogonal (KCQ<sup>C</sup> ¼ QCK<sup>C</sup> ¼ 0), that is, K<sup>C</sup> and Q<sup>C</sup> are projectors. Thus, we can assume that the CRLGSI has similar properties to those described for the RLPSI (see Chap. 3 for details) when matrices <sup>Ψ</sup><sup>C</sup> <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ and <sup>T</sup><sup>C</sup> <sup>¼</sup> <sup>P</sup> <sup>Γ</sup> Γ Γ are known.

The maximized selection response and the optimized expected genetic gain per trait of the CRLGSI can be written as

138 6 Constrained Linear Genomic Selection Indices

$$R\_{CR} = \frac{k\_I}{L\_I} \sqrt{\mathfrak{P}\_{CR}' \mathbf{T}\_C \mathfrak{P}\_{CR}} \tag{6.14}$$

and

$$\mathbf{E}\_{CR} = \frac{k\_I}{L\_I} \frac{\mathbf{\Psi} \mathbf{\mathfrak{P}}\_{CR}}{\sqrt{\mathbf{\mathfrak{P}}\_{CR}^{\prime} \mathbf{T}\_C \mathbf{\mathfrak{P}}\_{CR}}},\tag{6.15}$$

respectively. Although in the RLGSI and the PPG-LGSI the interval between selection cycles is denoted as LG, in the CRLGSI it is denoted as LI. This is because the RLPSI and the CRLGSI should have the same interval between selection cycles.

#### 6.3.2 Numerical Examples

To illustrate the CRLGSI theoretical results, we use a real training maize (Zea mays) F2 population with 248 genotypes (each with two repetitions), 233 molecular markers, and three traits: GY (ton ha<sup>1</sup> ), EHT (cm), and PHT (cm). Matrices P and C were estimated based on Eqs. (2.22) to (2.24) described in Chap. 2. The 2 3

estimated matrices were Pb ¼ 0:45 1:33 2:33 1:33 65:07 83:71 2:33 83:71 165:99 4 5 and Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 2 4 3 5. In a similar manner, we estimated matrix Γ using Eqs. (5.21) to (5.23) described in Chap. 5. The estimated matrix was Γb ¼ 0:07 0:65 1:05 0:65 10:62 14:25 1:05 14:25 26:37 2 4 3 5.

To estimate the CRLGSI and its associated parameters (selection response, expected genetic gain per trait, etc.), we need to obtain matrices <sup>T</sup>b<sup>C</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb and <sup>Ψ</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> <sup>C</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb using phenotypic and genomic information and the estimated CRLGSI vector of coefficients <sup>β</sup>bCR <sup>¼</sup> <sup>K</sup><sup>b</sup> <sup>C</sup>βbC, where <sup>K</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> I Qb <sup>C</sup> , <sup>Q</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>C</sup> Φb <sup>C</sup> Φb 0 CTb<sup>1</sup> <sup>C</sup> Φb <sup>C</sup> 1 Φb 0 <sup>C</sup>, Φb <sup>C</sup> ¼ U<sup>0</sup> <sup>C</sup>Ψ<sup>b</sup> <sup>C</sup>, and <sup>β</sup>b<sup>C</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>C</sup> Ψb <sup>C</sup>aC.

We have indicated that the main difference between the RLGSI and the CRLGSI is matrix U<sup>0</sup> <sup>C</sup>, on which we now need to impose two restrictions: one for the trait and another for its associated GEBV. Consider the (Zea mays) F2 population described earlier and suppose that we restrict trait GY; then, matrixU<sup>0</sup> <sup>C</sup> should be constructed as U0 <sup>C</sup><sup>1</sup> <sup>¼</sup> <sup>100000</sup> <sup>000100</sup> . If we restrict traits GY and EHT, matrix <sup>U</sup><sup>0</sup> <sup>C</sup> should

be constructed as U<sup>0</sup> C<sup>2</sup> ¼ 100000 010000 000100 000010 2 6 6 4 3 7 7 5 , etc. The procedure for obtaining

matrices <sup>K</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> I Qb <sup>C</sup> , <sup>Q</sup><sup>b</sup> <sup>C</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>C</sup> Φb <sup>C</sup> Φb 0 CTb<sup>1</sup> <sup>C</sup> Φb <sup>C</sup> 1 Φb 0 <sup>C</sup>, and Φb <sup>C</sup> ¼ U<sup>0</sup> <sup>C</sup>Ψb <sup>C</sup> is similar to that described in Chap. 3.

Let w<sup>0</sup> ¼ ½ 5 0:1 0:1000 be the vector of economic weights and assume that we restrict trait GY; in this case, according to the estimated matrices <sup>P</sup>b, <sup>C</sup>b, and <sup>Γ</sup><sup>b</sup> described earlier, the estimated CRLGSI vector of coefficients was βb0 RG ¼ ½ 0:076 0:004 0:018 2:353 0:096 0:082 , whence the estimated CRLGSI can be written as

$$\begin{array}{l} \dot{I}\_{\text{CR}} = 0.076 \text{GY} - 0.004 \text{EHT} - 0.018 \text{PHT} + 2.353 \text{GEBV}\_{\text{GY}} - 0.096 \text{GEBV}\_{\text{EHT}} \\ - 0.082 \text{GEBV}\_{\text{PHT}} \end{array}$$

where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with traits GY, EHT, and PHT respectively. The same procedure is valid for two or more restrictions.

Figure 6.3 presents the frequency distribution of the estimated CRLGSI values for one (Fig. 6.3a) and two null restrictions (Fig. 6.3b) using matrices U<sup>0</sup> <sup>C</sup><sup>1</sup> and U<sup>0</sup> C2 , and the real data set of the F2 population. For both restrictions, the frequency distribution of the estimated CRLGSI values approaches normal distribution.

Suppose a selection intensity of 10% (kI ¼ 1.755), matrix U0 <sup>C</sup><sup>1</sup> <sup>¼</sup> <sup>100000</sup> <sup>000100</sup> and that the vector of economic weights is <sup>w</sup><sup>0</sup> <sup>¼</sup> ½ 5 0:1 0:1000 ; then, according to the estimated matrices Pb, Cb, and Γb described earlier, the estimated CRLGSI selection response and the estimated CRLGSI expected genetic gain per trait were RbCR ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CRTbCβbCR <sup>q</sup> ¼ 0:96 and Eb0 CR ¼ kI βb0 CRΨ<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CRTbCβbCR <sup>q</sup> <sup>¼</sup> ½ <sup>0</sup> 3:<sup>53</sup> 6:03 0 2:<sup>93</sup> 4:<sup>87</sup> respectively,

whereas the estimated CRLGSI accuracy was <sup>b</sup>ρHICR <sup>¼</sup> <sup>σ</sup>bICR σbH ¼ 0:51 (see Chaps. 3 and 5 for details).

Now, we use the simulated data described in Chap. 2, Sect. 2.8.1 to compare CRLGSI efficiency versus RLGSI efficiency. The criteria for this comparison are the Technow inequality (Eq. 5.18, Chap. 5) and the ratio of the estimated CRLGSI accuracy (bρHICR ) to the estimated RLGSI accuracy (bρHIR ) expressed as percentages (Eq. 5.17, Chap. 5), i.e., <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCR 1 , where <sup>b</sup>λ<sup>P</sup> <sup>¼</sup> <sup>b</sup>ρHICR <sup>=</sup>bρHIR , for one, two, and three null restrictions for five simulated selection cycles.

Table 6.7 presents the estimated CRLGSI heritability (bh<sup>2</sup> <sup>C</sup>), the estimated RLGSI accuracy (bρHIR ), the values of WC <sup>¼</sup> <sup>b</sup>ρHIR bhI LI (LI ¼ 4), and the values of

Fig. 6.3 Distribution of 244 estimated combined restricted linear genomic selection index (CRLGSI) values with one (a) and two (b) null restrictions respectively obtained in a real training population for one selection cycle in one environment

<sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCR 1 , where <sup>b</sup>λCR <sup>¼</sup> <sup>b</sup>ρHICR <sup>=</sup>bρHIR and <sup>b</sup>ρHICR is the estimated CRLGSI accuracy, for one, two, and three null restrictions for five simulated selection cycles. The averages of the WC <sup>¼</sup> <sup>b</sup>ρHIR bhC LI values for one, two, and three null restrictions were 1.26, 0.92, and 0.59 respectively, whereas the RLGSI interval length was 1.5 (LG <sup>¼</sup> 1.5). This means that the estimated Technow inequality (LG <sup>&</sup>lt; <sup>b</sup>ρHIG bhI LI ) was not true. Thus, for this data set, RLGSI efficiency in terms of time is not greater than CRLGSI efficiency. The inequality LG <sup>&</sup>lt; <sup>b</sup>ρHIG bhI LI was not true because the estimated RLGSI accuracy was very low, whereas CRLGSI heritability was high. Thus, note that the averages of the estimated RLGSI accuracy for one, two, and three null restrictions were 0.25, 0.19, and 0.14 respectively, whereas the averages of the estimated CRLGSI heritability values were 0.72, 0.75, and 0.89 respectively. Thus, according to these results, when the estimated RLGSI accuracy is very low and the estimated CRLGSI heritability is high, RLGSI efficiency will be lower than CRLGSI efficiency in terms of time.


Table6.7Estimatedcombinedrestrictedlineargenomicselectionindex(CRLGSI)heritability(<sup>b</sup>h<sup>2</sup> ),estimatedRLGSIaccuracy(bρHIR),valuesofWC

bρHIR

The last three columns of Table 6.7, from left to right, present the average of the values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCR 1 , for one, two, and three null restrictions of five simulated selection cycles. According to these results, CRLGSI efficiency was 53.78%, 78.25%, and 61.25% higher than RLGSI efficiency. Thus, for this data set, the CRLGSI was a better predictor of the net genetic merit than the RLGSI.

#### 6.4 The Combined Predetermined Proportional Gains Linear Genomic Selection Index

In the PPG-LPSI described in Chap. 3, the vector of the PPG (predetermined proportional gains) was d<sup>0</sup> ¼ d<sup>1</sup> d<sup>2</sup> ... dr ½ . However, because the combined predetermined proportional gains LGSI (CPPG-LGSI) uses phenotypic and GEBV information jointly to predict the net genetic merit, the vector of the PPG (dC) should be twice the standard vector d<sup>0</sup> , that is, d0 <sup>C</sup> ¼ d<sup>1</sup> d<sup>2</sup> dr drþ<sup>1</sup> drþ<sup>2</sup> d2<sup>r</sup> ½ , where we would expect that if d<sup>1</sup> is the PPG imposed on trait 1, then dr + 1 should be the PPG imposed on the GEBV associated with trait 1, etc. In addition, in the CPPG-LGSI, we have three possible options for determining (for each trait and GEBV) the PPG, e.g., for trait 1, d<sup>1</sup> ¼ dr + 1, d<sup>1</sup> > dr + 1, or d<sup>1</sup> < dr + 1. This is the main difference between the standard PPG-LPSI described in Chap. 3 and the CPPG-LGSI.

#### 6.4.1 The Maximized CPPG-LGSI Parameters

It can be shown that the vector of coefficients of the CPPG-LGSI can be written as

$$
\mathfrak{B}\_{CP} = \mathfrak{B}\_{CR} + \theta\_{CP}\mathfrak{G}\_{CP}, \tag{6.16}
$$

where

$$\boldsymbol{\Theta}\_{\rm CP} = \frac{\boldsymbol{\mathfrak{g}}\_{\rm C}^{\prime} \boldsymbol{\Phi}\_{\rm C} \left( \boldsymbol{\Phi}\_{\rm C}^{\prime} \widehat{\boldsymbol{\bf}}\_{\rm C}^{-1} \boldsymbol{\Phi}\_{\rm C} \right)^{-1} \boldsymbol{\mathsf{d}}\_{\rm C}}{\boldsymbol{\bf{d}}\_{\rm C}^{\prime} \left( \boldsymbol{\Phi}\_{\rm C}^{\prime} \widehat{\boldsymbol{\bf}}\_{\rm C}^{-1} \boldsymbol{\Phi}\_{\rm C} \right)^{-1} \boldsymbol{\mathsf{d}}\_{\rm C}} \tag{6.17}$$

is a proportionality constant. In addition, in Eq. (6.16), βCR ¼ KCβ<sup>C</sup> is the vector of coefficients of the CRLGSI (Eq. 6.13), <sup>δ</sup>CP <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> Φ<sup>C</sup> Φ0 CTb<sup>1</sup> <sup>C</sup> Φ<sup>C</sup> 1 dC, Φ0 <sup>C</sup> ¼ U<sup>0</sup> <sup>C</sup>ΨC, and <sup>β</sup><sup>C</sup> <sup>¼</sup> <sup>T</sup><sup>1</sup> <sup>C</sup> <sup>Ψ</sup>Ca<sup>C</sup> (the vector of coefficients of the CLGSI). When θCP ¼ 0, βCP ¼ βCR, and if θ ¼ 0 and U<sup>0</sup> <sup>C</sup> is the null matrix, then βCR ¼ βC. Thus, the CPPG-LGSI is more general than the CRLGSI and the CLGSI, and includes the latter two indices as particular cases. In addition, it can be shown that the CPPG-LGSI has the same properties as the PPG-LPSI described in Chap. 3.

The maximized selection response and the expected genetic gain per trait of the CPPG-LGSI can be written as

$$R\_{CP} = \frac{k\_I}{L\_I} \sqrt{\mathfrak{P}\_{CP}' \mathbf{T}\_C \mathfrak{P}\_{CP}} \tag{6.18}$$

and

$$\mathbf{E}\_{CP} = \frac{k\_I}{L\_I} \frac{\mathbf{\Psi} \mathbf{\P}\_{CP}}{\sqrt{\mathbf{\P}\_{CP}^\prime \mathbf{T}\_C \mathbf{\P}\_{CP}}},\tag{6.19}$$

respectively. Although in the RLGSI and the PPG-LGSI the interval between selection cycles is denoted as LG, in the CPPG-LGSI it is denoted as LI. This is because the RLPSI and the CPPG-LGSI should have the same interval between selection cycles because they use phenotypic information to predict the net genetic merit.

#### 6.4.2 Numerical Examples

Similar to the CRLGSI, to illustrate the CPPG-LGSI results we use the real training maize (Zea mays) F2 population with 248 genotypes, 233 molecular markers, and three traits—GY (ton ha<sup>1</sup> ), EHT (cm), and PHT (cm)—where Pb ¼ 0:45 1:33 2:33 1:33 65:07 83:71 2:33 83:71 165:99 2 4 3 5, Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 2 4 3 5, and Γb ¼ 0:07 0:65 1:05 0:65 10:62 14:25 1:05 14:25 26:37 2 4 3 5 were the estimated matrices of P, C, and Γ

respectively.

We can obtain the estimated CPPG-LGSI vector of coefficients as <sup>β</sup>bCP <sup>¼</sup> <sup>β</sup>bCR <sup>þ</sup> bθCPbδCP (Eq. 6.16). Suppose that we restrict trait GY and its associated GEBV with matrixU<sup>0</sup> <sup>C</sup><sup>1</sup> <sup>¼</sup> <sup>100000</sup> <sup>000100</sup> and the vector of predetermined restriction d0 <sup>C</sup> ¼ ½ 7 3:5 . In Sect. 6.3.2, we showed that the estimated CRLGSI vector of coefficients was <sup>β</sup>b<sup>0</sup> CR ¼ ½ 0:076 0:004 0:018 2:353 0:096 0:082 ; then, we only need to calculate <sup>b</sup>θCP and <sup>b</sup>δCP to obtain the vector of coefficients <sup>β</sup>bCP.

Let w<sup>0</sup> ¼ ½ 5 0:1 0:1000 be the vector of economic weights. It can be shown that bθCP ¼ 0:00030 is the estimated value of the proportionality constant and δ<sup>0</sup> CP ¼ ½ 0:56 77:28 40:89 49:44 77:28 40:89 . Thus, the estimated CPPG-LGSI vector of coefficients was βb0 CR ¼ ½ 0:76 0:030 0:004 2:369 0:070 0:096 , whence the estimated CPPG-LGSI can be written as

$$
\begin{array}{l}
\text{ $\tilde{I}\_{CP}$ } = 0.076\text{GY}-0.03\text{EHT}-0.004\text{PHT}+2.369\text{GEBV}\_{\text{GY}}-0.070\text{GEBV}\_{\text{EHT}} \\
\end{array}
$$

where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with traits GY, EHT, and PHT respectively. The same procedure is valid for two or more restrictions. Note that because bθCP ¼ 0:0003 is very small, the estimated CPPG-LGSI and CRLGSI values were very similar.

Figure 6.4 presents the frequency distribution of the estimated CPPG-LGSI values for one (Fig. 6.4a) and two predetermined restrictions (Fig. 6.4b) using 100000 2 3

matrices U<sup>0</sup> <sup>C</sup><sup>1</sup> and U<sup>0</sup> <sup>C</sup><sup>2</sup> ¼ 010000 000100 000010 6 6 4 7 7 5 , the vectors of the PPG

Fig. 6.4 Distribution of 244 estimated combined predetermined proportional gain linear genomic selection index (CPPG-LGSI) values with one (a) and two (b) predetermined restrictions, d ¼ 7 and d<sup>0</sup> ¼ ½ 7 3 respectively, obtained in a real training population for one selection cycle in one environment

d0 <sup>C</sup><sup>1</sup> ¼ ½ 7 3:5 and d<sup>0</sup> <sup>C</sup><sup>2</sup> ¼ ½ 7 3 3:5 1:5 , and the real data set F2. For both restrictions, the frequency distribution of the estimated CPPG-LGSI values approaches normal distribution.

Suppose a selection intensity of 10% (kI ¼ 1.755) and that we restrict trait GY and its associated GEBV. The estimated CPPG-LGSI selection response and expected genetic gain per trait were RbCP ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CPTbCβbCP <sup>q</sup> ¼ 0:98 and Eb<sup>0</sup> CP ¼ kI βb0 CPΨ<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CPTbβbCP <sup>q</sup> ¼ ½ 0:007 3:647 5:760 0:004 2:829 4:711 respectively, whereas

the estimated CPPG-LGSI accuracy was <sup>b</sup>ρHICP <sup>¼</sup> <sup>σ</sup>bICP σbH ¼ 0:52. Once again, because bθCP ¼ 0:0003, the latter results are very similar to the CRLGSI results.

Now, we use the simulated data described in Chap. 2, Sect. 2.8.1, to compare CPPG-LGSI efficiency versus PPG-LGSI efficiency. The criteria for this comparison are the Technow inequality (Chap. 5, Eq. 5.18) and the ratio of CPPG-LGSI accuracy (ρHICP ) to PPG-LGSI accuracy (ρHIP ) expressed as percentages (Chap. 5, Eq. 5.17), <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCP 1 , where <sup>b</sup>λCP <sup>¼</sup> <sup>b</sup>ρHICP <sup>=</sup>bρHIP for one, two, and three null restrictions in five simulated selection cycles.

Table 6.8 presents the estimated CPPG-LGSI heritability ( <sup>b</sup>h<sup>2</sup> <sup>I</sup> ), the estimated PPG-LGSI accuracy ( <sup>b</sup>ρHICP ), values of WCP <sup>¼</sup> <sup>b</sup>ρHIG bhI LI (LI ¼ 4) and <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCP 1 , where <sup>b</sup>λ<sup>P</sup> <sup>¼</sup> <sup>b</sup>ρHICP <sup>=</sup>bρHIP and <sup>b</sup>ρHIP is the estimated CPPG-LGSI accuracy, for one, two, and three null restrictions in five simulated selection cycles. The averages of the estimated WCP values for one, two, and three predetermined restrictions were 3.60, 3.31, and 2.50 respectively, whereas the PPG-LGSI interval length was 1.5 (LG ¼ 1.5). This means that the estimated Technow inequality, LG <sup>&</sup>lt; <sup>b</sup>ρHIG bhI LI, was true. Thus, for this data set, PPG-LGSI efficiency is greater

than CPPG-LGSI efficiency in terms of time.

The last three columns of Table 6.8, from left to right, present the values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCP 1 , for one, two, and three null restrictions in five simulated selection cycles. The average values of <sup>b</sup><sup>p</sup> <sup>¼</sup> <sup>100</sup> bλCP 1 for each of the three restrictions, in percentage terms, were 37.19%, 32.82%, and 37.08% respectively. This means that the CPPG-LGSI efficiency was greater than PPG-LGSI efficiency at predicting the net genetic merit.



#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 7 Linear Phenotypic Eigen Selection Index Methods

Abstract Based on the canonical correlation, on the singular value decomposition (SVD), and on the linear phenotypic selection indices theory, we describe the eigen selection index method (ESIM), the restricted ESIM (RESIM), and the predetermined proportional gain ESIM (PPG-ESIM), which use only phenotypic information to predict the net genetic merit. The ESIM is an unrestricted linear selection index, but the RESIM and PPG-ESIM are linear selection indices that allow null and predetermined restrictions respectively to be imposed on the expected genetic gains of some traits, whereas the rest remain without any restrictions. The aims of the three indices are to predict the unobservable net genetic merit values of the candidates for selection, maximize the selection response, and the accuracy, and provide the breeder with an objective rule for evaluating and selecting several traits simultaneously. Their main characteristics are: they do not require the economic weights to be known, the first multi-trait heritability eigenvector is used as its vector of coefficients; and because of the properties associated with eigen analysis, it is possible to use the theory of similar matrices to change the direction and proportion of the expected genetic gain values without affecting the accuracy. We describe the foregoing three indices and validate their theoretical results using real and simulated data.

#### 7.1 The Linear Phenotypic Eigen Selection Index Method

The conditions described in Chap. 2 for the linear phenotypic selection index (LPSI) are necessary and sufficient for constructing the linear phenotypic eigen selection index method (ESIM). The ESIM index can be written as <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> <sup>y</sup>, where <sup>b</sup><sup>0</sup> <sup>¼</sup> [b<sup>1</sup> <sup>b</sup><sup>2</sup> -- bt ] is the unknown index vector of coefficients, t is the number of traits, and <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> -- yt ½ is a known vector of trait phenotypic values. The objectives of ESIM are:

1. To predict the net genetic merit <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup>, where <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> ... gt ½ is the unknown vector of true breeding values for an individual and <sup>w</sup><sup>0</sup> <sup>¼</sup> <sup>w</sup><sup>1</sup> <sup>w</sup><sup>2</sup> ... wt ½ is a vector of unknown economic weights.


Although in the context of the LPSI w is a known and fixed vector of economic weights, in the ESIM w is fixed, but unknown and its values must be estimated in each selection cycle. This latter assumption is the fundamental difference between the ESIM and the LPSI and implies that the ESIM is more general than the LPSI. Thus, when w is known, the LPSI and ESIM give the same results.

#### 7.1.1 The ESIM Parameters

The theoretical ESIM selection response can be written as

$$R\_I = k\_I \sigma\_H \rho\_{HI},\tag{7.1}$$

where kI is the standardized selection differential (or selection intensity), <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> is the standard deviation of <sup>H</sup>, <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> Cb ffiffiffiffiffiffiffiffi w0 Cw <sup>p</sup> ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> is the correlation, and w<sup>0</sup> Cb <sup>¼</sup> <sup>σ</sup>HI the covariance between <sup>H</sup> and <sup>I</sup> respectively, <sup>σ</sup><sup>I</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffi b0 Pb p is the standard deviation of I, C is the covariance matrix of the true breeding values (g), and P is the covariance matrix of the trait phenotypic values (y).

In the ESIM, it is assumed that kI and σ<sup>H</sup> are fixed, and that C and P are known; thus, to maximize Eq. (7.1), it is necessary to maximize ρ<sup>2</sup> HI <sup>¼</sup> <sup>w</sup><sup>0</sup> ð Þ Cb <sup>2</sup> ð Þ <sup>w</sup><sup>0</sup> Cw <sup>b</sup><sup>0</sup> ð Þ Pb with respect to vectors b and w under the restrictions σ<sup>2</sup> <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Cw, σ<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Pb, and 0 < σ<sup>2</sup> H, σ2 <sup>I</sup> <sup>&</sup>lt;1, where <sup>σ</sup><sup>2</sup> <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Cw is the variance of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g and σ<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Pb is the variance of <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y. That is, it is necessary to maximize the function

$$f(\mathbf{b}, \mathbf{w}, \mu, \phi) = \left(\mathbf{w}' \mathbf{C} \mathbf{b}\right)^2 - \mu \left(\mathbf{b}' \mathbf{P} \mathbf{b} - \sigma\_I^2\right) - \phi \left(\mathbf{w}' \mathbf{C} \mathbf{w} - \sigma\_H^2\right) \tag{7.2}$$

with respect to b, w, μ, and ϕ, where μ and ϕ are Lagrange multipliers. The derivative results of Eq. (7.2) with respect to b, w, μ, and ϕ are:

$$\begin{aligned} \mathbf{(w'Cb)Cw} - \mu \mathbf{Pb} &= \mathbf{0}, \\ \mathbf{(w'Cb)Cb} &\quad \mu \mathbf{Cw} = \mathbf{0} \end{aligned} \tag{7.3}$$

$$\begin{aligned} \mathbf{(w' \mathbf{C}b)\mathbf{C}b - \phi \mathbf{C}w &= \mathbf{0}, \\ \mathbf{m} \quad \mathbf{2} \quad \mathbf{\ldots} \mathbf{1} \dots \mathbf{n}' \mathbf{C} \dots \mathbf{2} \end{aligned} \tag{7.4}$$

$$\mathbf{b}'\mathbf{P}\mathbf{b} = \sigma\_I^2 \text{ and } \mathbf{w}'\mathbf{C}\mathbf{w} = \sigma\_H^2,\tag{7.5}$$

respectively, where Eq. (7.5) denotes the restrictions imposed for maximizing ρ<sup>2</sup> HI. It can be shown that w<sup>0</sup> Cb ¼ ffiffiffiffiffiffiffi μσ<sup>2</sup> I q ¼ ffiffiffiffiffiffiffiffiffi ϕσ<sup>2</sup> H q ¼ <sup>θ</sup><sup>1</sup>=<sup>2</sup> ; then, Eqs. (7.3) and (7.4) can be written as

$$
\theta^{1/2} \mathbf{C} \mathbf{w} - \frac{\theta}{\sigma\_I^2} \mathbf{P} \mathbf{b} = \mathbf{0} \tag{7.6}
$$

and

$$
\theta^{1/2} \mathbf{C} \mathbf{b} - \frac{\theta}{\sigma\_H^2} \mathbf{C} \mathbf{w} = \mathbf{0},
\tag{7.7}
$$

respectively. Equation (7.6) is equal to Cw ¼ <sup>θ</sup><sup>1</sup>=<sup>2</sup> σ2 I Pb; then, vector w can be written as

$$\mathbf{w}\_{E} = \frac{\theta^{1/2}}{\sigma\_{I}^{2}} \mathbf{C}^{-1} \mathbf{P} \mathbf{b}. \tag{7.8}$$

By the result of Eq. (7.8), the net genetic merit in the ESIM context is HE <sup>¼</sup> <sup>w</sup><sup>0</sup> Eg and the correlation between HE and <sup>I</sup> is <sup>ρ</sup>HEI <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>Cb ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 <sup>E</sup>Cw<sup>E</sup> p ffiffiffiffiffiffiffiffiffiffi b0 Pb p ¼ ffiffiffiffiffiffiffiffiffiffi b0 Pb p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 PC<sup>1</sup> Pb <sup>p</sup> . Now, it is necessary to find the vector <sup>b</sup> that maximizes <sup>ρ</sup>HEI, which should be the ESIM index vector of coefficients. Substituting w with w<sup>E</sup> in Eq. (7.7), we get

$$\mathbf{Cb} - \frac{\left(\mathbf{w}\_E' \mathbf{Cb}\right)^2}{\sigma\_I^2 \sigma\_{H\_E}^2} \mathbf{P} \mathbf{b} = \mathbf{0},\tag{7.9}$$

where w0 <sup>E</sup>Cb <sup>2</sup> σ2 I σ2 HE ¼ ρ2 HEI is the square of the correlation between ESIM and HE <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>g. Let ρ<sup>2</sup> HEI <sup>¼</sup> <sup>λ</sup><sup>2</sup> <sup>E</sup>, then Eq. (7.9) can be written as

$$(\mathbf{P}^{-1}\mathbf{C} - \lambda\_E^2 \mathbf{I})\mathbf{b}\_E = \mathbf{0},\tag{7.10}$$

and the optimized ESIM index isIE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y. Note that in Eq. (7.10) <sup>P</sup><sup>1</sup> C is the multitrait heritability. By Eqs. (7.8) and (7.10), the maximized correlation between HE ¼ <sup>w</sup><sup>0</sup> <sup>E</sup><sup>g</sup> and IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y (or ESIM accuracy) can be written as

$$
\rho\_{H\_E I\_E} = \frac{\sigma\_{I\_E}}{\sigma\_{H\_E}},
\tag{7.11}
$$

where <sup>σ</sup>IE <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi b0 <sup>E</sup>Pb<sup>E</sup> q is the standard deviation of the variance of IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y, and <sup>σ</sup>HE <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 EPC<sup>1</sup> Pb<sup>E</sup> q is the standard deviation of the variance of HE <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>g. Hereafter, we write Eq. (7.11) as <sup>ρ</sup><sup>E</sup> <sup>¼</sup> <sup>ρ</sup>HEIE or <sup>λ</sup><sup>E</sup> <sup>¼</sup> <sup>ρ</sup>HEIE to simplify the notation.

An additional restriction on Eq. (7.10) is b<sup>0</sup> <sup>b</sup> <sup>¼</sup> 1, because <sup>ρ</sup>HEIE is invariant to the scale change and because if b<sup>E</sup> is an eigenvector of the multi-trait heritability matrix P<sup>1</sup> C, vector αb<sup>E</sup> is also an eigenvector of P<sup>1</sup> C for all real values of α (Mardia et al. 1982). This means that in the ESIM the magnitude of an eigenvector is unimportant; only the direction matters (Watkins 2002). Equation (7.10) can also be written as Cb<sup>E</sup> <sup>¼</sup> <sup>λ</sup><sup>2</sup> <sup>E</sup>PbE, which is called the generalized eigenvalue problem (Watkins 2002). In the latter case, b<sup>E</sup> is called a generalized eigenvector and λ<sup>2</sup> <sup>E</sup> a generalized eigenvalue. The generalized eigenvalues may not exist; that is, they may be infinite. However, if P is positive definite and has the same size as C, all eigenvalues of P<sup>1</sup> C exist and are finite (Gentle 2007). Matrix P is symmetric and positive definite and its eigenvalues are different with a probability of 1 if the number of genotypes is higher than the number of traits (Okamoto 1973).

If the heritability of the ESIM is h<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> Cb b0 Pb, then another way of writing Eq. (7.1) is

$$R\_I = k\_I \sigma\_I h\_I^2 = k\_I \frac{\mathbf{b}' \mathbf{C} \mathbf{b}}{\sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}}},\tag{7.12}$$

which is similar to the univariate breeder's equation (see Chap. 2, Eq. 2.4). All the parameters of Eq. (7.12) were defined earlier.

The derivative of the ratio <sup>b</sup><sup>0</sup> Cb ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> (Eq. 7.12) with respect to <sup>b</sup> can be written as 2(b0 Pb) 1/2Cb (b<sup>0</sup> Pb) 1/2(b0 Cb)Pb ¼ <sup>0</sup>, and, except by a proportionality constant, the result is

$$\left(\mathbf{P}^{-1}\mathbf{C} - h\_{I\_E}^2 \mathbf{I}\right) \mathbf{b}\_E = \mathbf{0},\tag{7.13}$$

where h<sup>2</sup> IE ¼¼ <sup>b</sup><sup>0</sup> <sup>E</sup>Cb<sup>E</sup> b0 <sup>E</sup>Pb<sup>E</sup> is the maximized ESIM heritability. Let λ<sup>2</sup> <sup>E</sup> <sup>¼</sup> <sup>ρ</sup><sup>2</sup> <sup>E</sup> <sup>¼</sup> <sup>h</sup><sup>2</sup> IE , then Eq. (7.13) is equal to Eq. (7.10) and can be written as b<sup>0</sup> <sup>E</sup>Cb<sup>E</sup> <sup>¼</sup> <sup>λ</sup><sup>2</sup> Eb<sup>0</sup> <sup>E</sup>PbE, whence the maximized ρ<sup>2</sup> <sup>E</sup> in terms of h<sup>2</sup> IE is

$$
\rho\_E^2 = \frac{\mathbf{b}\_E' \mathbf{C} \mathbf{b}\_E}{\mathbf{b}\_E' \mathbf{P} \mathbf{b}\_E},\tag{7.14}
$$

which should give a equivalent result to that of Eq. (7.11).

By Eq. (7.11) and <sup>σ</sup>HE <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 EPC<sup>1</sup> Pb<sup>E</sup> q , the maximized ESIM selection response and expected genetic gain per trait can be written as

$$R\_E = k\_I \sqrt{\mathbf{b}\_E' \mathbf{P} \mathbf{b}\_E} \tag{7.15}$$

and

$$\mathbf{E}\_E = k\_I \frac{\mathbf{C} \mathbf{b}\_E}{\sqrt{\mathbf{b}\_E' \mathbf{P} \mathbf{b}\_E}},\tag{7.16}$$

respectively. Equations (7.15) and (7.16) do not require the economic weights to be known. In the original derivation of the ESIM, Cerón-Rojas et al. (2008) imposed the restrictions σ<sup>2</sup> HE <sup>¼</sup> 1 and <sup>σ</sup><sup>2</sup> IE <sup>¼</sup> 1. Under these restrictions, <sup>λ</sup><sup>E</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>Cb<sup>E</sup> and Eq. (7.15) can be written as RE <sup>¼</sup> kIλE. When <sup>σ</sup><sup>2</sup> HE 6¼ 1 Eq. (7.15) is equal to RE <sup>¼</sup> kIσHE <sup>λ</sup>E, where <sup>σ</sup>HE <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 EPC<sup>1</sup> Pb<sup>E</sup> q and λ<sup>2</sup> <sup>E</sup> <sup>¼</sup> <sup>ρ</sup><sup>2</sup> <sup>E</sup> <sup>¼</sup> <sup>h</sup><sup>2</sup> IE .

Let <sup>T</sup> ¼ <sup>P</sup><sup>1</sup> C and λ<sup>2</sup> <sup>E</sup> <sup>¼</sup> <sup>h</sup><sup>2</sup> IE ; then, Eq. (7.13) can be written as TIb<sup>E</sup> <sup>¼</sup> <sup>λ</sup><sup>2</sup> <sup>E</sup>IbE, where <sup>I</sup> ¼ <sup>F</sup><sup>1</sup> <sup>F</sup> is an identity matrix of size <sup>t</sup> <sup>t</sup> (t¼ number of traits), and <sup>F</sup> <sup>¼</sup> diag f <sup>1</sup> <sup>f</sup> <sup>1</sup> -- <sup>f</sup> f g<sup>t</sup> is a diagonal matrix with values equal to any real number, except zero values. Thus, another way of writing Eqs. (7.10) and (7.13) is

$$(\mathbf{T}\_2 - \lambda\_E^2 \mathbf{I})\mathfrak{B} = \mathbf{0},\tag{7.17}$$

where <sup>T</sup><sup>2</sup> <sup>¼</sup> FTF<sup>1</sup> and <sup>β</sup> <sup>¼</sup> FbE; <sup>T</sup> and <sup>T</sup><sup>2</sup> <sup>¼</sup> FTF<sup>1</sup> are similar matrices and both have the same eigenvalues but different eigenvectors (Harville 1997). When the <sup>F</sup> values are only 1s, vector <sup>b</sup><sup>E</sup> is not affected; when the <sup>F</sup> values are only 1s, vector <sup>b</sup><sup>E</sup> changes its direction, and if the <sup>F</sup> values are different from 1 and 1, matrix F changes the proportional values of bE. In practice, b<sup>E</sup> is first obtained from Eq. (7.13) and then multiplied by matrix <sup>F</sup> to obtain <sup>β</sup> <sup>¼</sup> FbE, that is, <sup>β</sup> is a linear transformation of <sup>b</sup>E. Matrix <sup>T</sup><sup>2</sup> <sup>¼</sup> FTF<sup>1</sup> is called the similarity transformation, and matrix F is called the transforming matrix (Watkins 2002). Cerón-Rojas et al. (2006) introduced an alternative procedure for modifying the b<sup>E</sup> signs that is a particular case of Eq. (7.17). Vector <sup>β</sup> <sup>¼</sup> Fb<sup>E</sup> can substitute <sup>b</sup><sup>E</sup> in Eqs. (7.15) and (7.16); and in this case, the optimized ESIM index should be written as IE <sup>¼</sup> <sup>β</sup><sup>0</sup> y.

#### 7.1.2 Statistical ESIM Properties

The ratio of the index accuracies and the variance of the predicted error (VPE) are good criteria for comparing the index efficiencies for predicting the net genetic merit (see Chap. 2 for details). In Eq. (7.11), we obtained the accuracy of the ESIM; now, we derive the VPE of the ESIM.

The variance of IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y (σ<sup>2</sup> IE ) and the covariance between HE <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>g and IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y(σHEIE ) are the same, that is,

$$
\sigma\_{I\_E}^2 = \mathbf{b}\_E' \mathbf{P} \mathbf{b}\_E \text{ and } \sigma\_{H\_E I\_E} = \mathbf{w}\_E' \mathbf{C} \mathbf{b}\_E = \mathbf{b}\_E' \mathbf{P} \mathbf{C}^{-1} \mathbf{C} \mathbf{b}\_E = \mathbf{b}\_E' \mathbf{P} \mathbf{b}\_E,\tag{7.18}
$$

respectively; that is, σ<sup>2</sup> IE <sup>¼</sup> <sup>σ</sup>HEIE . By Eq. (7.18), the VPE of the ESIM can be written as

$$E\left[\left(H\_E - I\_E\right)^2\right] = \sigma\_{H\_E}^2 + \sigma\_{I\_E}^2 - 2\sigma\_{H\_E I\_E} = \sigma\_{H\_E}^2 - \sigma\_{I\_E}^2 = \left(1 - \rho\_E^2\right)\sigma\_{H\_E}^2. \tag{7.19}$$

The relative effectiveness of IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup><sup>y</sup> in predicting HE <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>g is the ratio of <sup>1</sup> <sup>ρ</sup><sup>2</sup> E σ<sup>2</sup> HE over <sup>σ</sup><sup>2</sup> HE , i.e., 1 <sup>ρ</sup><sup>2</sup> <sup>E</sup> ; thus, the greater ρ<sup>2</sup> <sup>E</sup> is, the more effective IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup><sup>y</sup> is at predicting HE <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>g. The mean squared effect of IE on HE, or the total variance of HE explained by IE is

$$
\sigma\_{I\_E}^2 = \rho\_E^2 \sigma\_{H\_E}^2,\tag{7.20}
$$

and the relative mean squared effect can be measured by ρ<sup>2</sup> <sup>E</sup> (Anderson 2003). If in Eq. (7.20) ρ<sup>2</sup> <sup>E</sup> <sup>¼</sup> 1, <sup>σ</sup><sup>2</sup> IE <sup>¼</sup> <sup>σ</sup><sup>2</sup> HE , and if <sup>ρ</sup><sup>2</sup> <sup>E</sup> <sup>¼</sup> 0, <sup>σ</sup><sup>2</sup> IE <sup>¼</sup> 0. That is, the variance of HE explained by IE is proportional to ρ<sup>2</sup> <sup>E</sup>, and when ρ<sup>2</sup> <sup>E</sup> is close to 1, σ<sup>2</sup> IE is close to <sup>σ</sup><sup>2</sup> HE , and if ρ<sup>2</sup> <sup>E</sup> is close to 0, σ<sup>2</sup> IE is close to 0. All these results are valid for any index associated with the ESIM, such as the restricted ESIM (RESIM) and the predetermined proportional gains ESIM (PPG-ESIM), which are described in the following sections of this chapter.

#### 7.1.3 The ESIM and the Canonical Correlation Theory

Canonical correlation theory describes the associations between two sets of variables (Hotelling 1935, 1936) and searches for linear combinations, called canonical variables, of each of two sets of variables having maximal correlation. The vector of coefficient of these linear combinations is called the canonical vector and the correlations between the canonical variables is called the canonical correlation (Wilms and Croux 2016).

To see how the ESIM and the canonical correlation theory are related, note that vectors <sup>y</sup> and <sup>g</sup> (Eq. 7.1) can be ordered in a new vector <sup>x</sup> as <sup>x</sup><sup>0</sup> ¼ <sup>y</sup><sup>0</sup> <sup>g</sup><sup>0</sup> ½ , whence the covariance matrix of <sup>x</sup> is P C C C . One measure of the association between the <sup>j</sup>th linear combination of <sup>y</sup>(IE <sup>¼</sup> <sup>b</sup><sup>0</sup> E j <sup>y</sup>) and the <sup>j</sup>th linear combination of <sup>g</sup>(HE <sup>¼</sup> <sup>w</sup><sup>0</sup> E j g) is the jth canonical correlation (λj) value obtained from equation P<sup>1</sup> <sup>C</sup> <sup>λ</sup><sup>2</sup> j I bEj <sup>¼</sup> <sup>0</sup>, where <sup>b</sup>Ej is the <sup>j</sup>th canonical vector ( <sup>j</sup> <sup>¼</sup> 1, 2---, t) of matrix P<sup>1</sup> C, and <sup>w</sup><sup>E</sup> <sup>j</sup> <sup>¼</sup> <sup>C</sup><sup>1</sup> Pb<sup>E</sup> <sup>j</sup> . Thus, in the canonical correlation context, IE <sup>¼</sup> <sup>b</sup><sup>0</sup> E j <sup>y</sup> and HE <sup>¼</sup> <sup>w</sup><sup>0</sup> E j g are canonical variables.

In the ESIM, the first eigenvector (b<sup>E</sup><sup>1</sup> ) of matrix P<sup>1</sup> C should be used on IE <sup>¼</sup> <sup>b</sup><sup>0</sup> E1 y; the first eigenvalue (λ<sup>2</sup> <sup>1</sup> ) and <sup>b</sup><sup>E</sup><sup>1</sup> of <sup>P</sup><sup>1</sup> C should be used on the ESIM selection response and on the ESIM expected genetic gain per trait, because, in this case, the ESIM has maximum accuracy compared with other indices, such as the LPSI. The latter results in this subsection imply that the sampling statistical properties associated with the canonical correlation theory are also valid for the ESIM.

#### 7.1.4 Estimated ESIM Parameters and Their Sampling Properties

The estimated covariance matrix of the true breeding values (C) and that of the trait phenotypic values (P) are denoted as Cb and Pb respectively; they can be obtained by restricted maximum likelihood using Eqs. (2.22) to (2.24) described in Chap. 2. With matrices <sup>C</sup><sup>b</sup> and <sup>P</sup>b, we constructed matrix <sup>T</sup><sup>b</sup> ¼ <sup>P</sup>b<sup>1</sup> Cb and equation

$$(\widehat{\mathbf{T}} - \widehat{\boldsymbol{\lambda}}\_{Ej}^{2} \mathbf{I}) \widehat{\mathbf{b}}\_{Ej} = \mathbf{0},\tag{7.21}$$

<sup>j</sup> ¼ 1, 2, ---, <sup>t</sup>, where <sup>t</sup> is the number of traits in the ESIM index. Note that <sup>b</sup>λ<sup>2</sup> Ej is positive only if <sup>P</sup><sup>b</sup> is positive definite (all eigenvalues positive) and <sup>C</sup><sup>b</sup> is positive semidefinite (no negative eigenvalues); in addition, as <sup>P</sup>b<sup>1</sup> Cb is an asymmetric matrix, the values of <sup>b</sup>bEj and <sup>b</sup>λ<sup>2</sup> Ej should be obtained using the singular value decomposition (SVD) theory (Anderson 2003).

Matrix <sup>T</sup><sup>b</sup> is square and asymmetric of order <sup>t</sup> <sup>t</sup> and rank <sup>q</sup> minimum ( <sup>p</sup>, <sup>c</sup>), where <sup>p</sup> and <sup>c</sup> denote the rank of <sup>P</sup>b<sup>1</sup> and <sup>C</sup><sup>b</sup> respectively; the rank of <sup>T</sup><sup>b</sup> is equal to c only ifCb is square and nonsingular. Thus, matrixTb has a maximum of q eigenvalues different from zero (Rao 2002). In addition,TbTb<sup>0</sup> andTb0 Tb are symmetric matrices, but <sup>T</sup>bTb<sup>0</sup> 6¼ <sup>T</sup>b<sup>0</sup> Tb. Using the SVD theory, matrix Tb can be written as

$$
\hat{\mathbf{T}} = \mathbf{V}\_1 \mathbf{L}^{1/2} \mathbf{V}\_2',\tag{7.22}
$$

where V<sup>1</sup> (V<sup>0</sup> <sup>1</sup>V<sup>1</sup> <sup>¼</sup> <sup>V</sup>1V<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>I</sup>q) and <sup>V</sup><sup>2</sup> (V<sup>0</sup> <sup>2</sup>V<sup>2</sup> <sup>¼</sup> <sup>V</sup>2V<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>I</sup>q) are matrices with the eigenvectors of matrices TbTb<sup>0</sup> and Tb<sup>0</sup> <sup>T</sup><sup>b</sup> respectively; <sup>L</sup>1/2 is a diagonal matrix with the square root of the eigenvalues (bλ<sup>2</sup> <sup>E</sup><sup>1</sup> <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>2</sup> -- bλ2 Eq > 0) of either TbTb<sup>0</sup> or Tb<sup>0</sup> Tb (the eigenvalues of TbTb<sup>0</sup> and Tb<sup>0</sup> <sup>T</sup><sup>b</sup> are the same). The entries <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>1</sup> <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>2</sup> -- bλ2 Eq <sup>&</sup>gt; 0 of <sup>L</sup>1/2 are uniquely determined, and they are called the singular values ofTb. The columns of V<sup>1</sup> are orthonormal vectors called left singular vectors of Tb, and the columns of V<sup>2</sup> are called right singular vectors (Watkins 2002).

Estimators <sup>b</sup>b<sup>E</sup><sup>1</sup> and <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>1</sup> of the first eigenvector <sup>b</sup><sup>E</sup><sup>1</sup> and the first eigenvalue <sup>λ</sup><sup>2</sup> E1 respectively are the first column of matrix V<sup>1</sup> and the first diagonal element of matrix <sup>L</sup>1/2. Thus, because <sup>T</sup>bTb<sup>0</sup> is a symmetric matrix, the maximum likelihood estimators bλ2 <sup>E</sup><sup>1</sup> and bb<sup>E</sup><sup>1</sup> in the ESIM context can be obtained from

$$(\hat{\mathbf{T}}^{\prime}\hat{\mathbf{T}}^{\prime} - \hat{\boldsymbol{\mu}}\_{j}\mathbf{I})\hat{\mathbf{b}}\_{E\_{\prime}} = \mathbf{0},\tag{7.23}$$

where <sup>μ</sup>b<sup>j</sup> <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> E j , <sup>j</sup>¼ 1, 2, ..., <sup>t</sup>. In the asymptotic context, <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>1</sup> and bb<sup>E</sup><sup>1</sup> are consistent and unbiased estimators (Anderson 2003).

The latter results allow the ESIM index (IE <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>y) asb<sup>I</sup> <sup>E</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>E</sup><sup>1</sup> y to be estimated. The estimator of the maximized ESIM selection response and expected genetic gain per trait are <sup>R</sup>b<sup>E</sup> <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>1PbbbE<sup>1</sup> q and <sup>E</sup>b<sup>E</sup> <sup>¼</sup> kI <sup>C</sup>bbbE<sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>1PbbbE<sup>1</sup> <sup>q</sup> respectively, whereas the

estimator of the maximized ESIM accuracy is bλE<sup>1</sup> , which should be similar to the estimator of the square root of the maximized ESIM heritability.

In the asymptotic context, the estimator of bEj (bbEj ) has multivariate normal distribution with expectation E <sup>b</sup>bEj <sup>¼</sup> <sup>b</sup>Ej and variance

$$\operatorname{Var}\left(\widehat{\mathbf{b}}\_{Ej}\right) = \frac{1}{2n}\mathbf{b}\_{Ej}\mathbf{b}\_{Ej}^{\prime} + \frac{1}{n}\left(1 - \lambda\_{Ej}^{2}\right)\sum\_{i\neq j}^{\prime}\frac{\lambda\_{Ej}^{2} + \lambda\_{Ei}^{2} - 2\lambda\_{Ei}^{2}\lambda\_{Ej}^{2}}{\left(\lambda\_{Ei}^{2} - \lambda\_{Ej}^{2}\right)^{2}}\mathbf{b}\_{Ei}\mathbf{b}\_{Ei}^{\prime},\tag{7.24}$$

and, for <sup>i</sup> 6¼ <sup>j</sup>, the covariance between <sup>b</sup>bEi and <sup>b</sup>bEj can be written as

$$Cov\left(\hat{\mathbf{b}}\_{Ei}, \hat{\mathbf{b}}\_{Ej}\right) = \frac{\left(1 - \lambda\_{Ej}^2\right)\left(1 - \lambda\_{Ei}^2\right)\left(\lambda\_{Ei}^2 + \lambda\_{Ej}^2\right)}{n\left(\lambda\_{Ei}^2 - \lambda\_{Ej}^2\right)^2} \mathbf{b}\_{Ej} \mathbf{b}\_{Ei}',\tag{7.25}$$

where n is the number of individuals or genotypes (Anderson 1999). The variance of bbEj and the covariance between bbEi and bbEj depend not only on n, but also on eigenvalues λ<sup>2</sup> Ei and λ<sup>2</sup> Ej. Suppose that λ<sup>2</sup> Ej > λ<sup>2</sup> Ei; then, when λ<sup>2</sup> Ej is very close to 1, Var <sup>b</sup>bEj 1 2n bEjb0 Ej(""denotes an approximation) andCov <sup>b</sup>bEi; <sup>b</sup>bEj is very close to 0. By the result of Eq. (7.24), the variance of the first eigenvector (bbE1) of <sup>P</sup>b<sup>1</sup> Cb can be written as Var bb<sup>E</sup><sup>1</sup> ¼ 1 2n bE1b0 <sup>E</sup>1þ <sup>1</sup> <sup>n</sup> <sup>1</sup> <sup>λ</sup><sup>2</sup> E1 <sup>P</sup><sup>t</sup> j¼2 λ2 <sup>E</sup>1þλ<sup>2</sup> Ej2λ<sup>2</sup> E1λ<sup>2</sup> Ej λ2 <sup>E</sup>1λ<sup>2</sup> ð Þ Ej <sup>2</sup> bEjb<sup>0</sup> Ej. If the first eigenvalue λ<sup>2</sup> <sup>E</sup><sup>1</sup> of <sup>P</sup><sup>1</sup> C is very close to 1 (λ<sup>2</sup> <sup>E</sup><sup>1</sup> 1), Var bb<sup>E</sup><sup>1</sup> ¼ 1 2n bE1b0 <sup>E</sup><sup>1</sup> and Cov <sup>b</sup>b<sup>E</sup>1; <sup>b</sup>bEj 0.

In the asymptotic context, the jth estimator (bλEj) of the canonical correlations has normal distribution with expectation E <sup>b</sup>λEj <sup>λ</sup>Ej and variance

$$\operatorname{Var}\left(\widehat{\lambda}\_{Ej}\right) \approx \frac{\left(1 - \lambda\_{Ej}^2\right)^2}{n},\tag{7.26}$$

whereas the <sup>j</sup>th estimator of the square of the canonical correlations <sup>b</sup>λ<sup>2</sup> Ej has normal distribution with expectation E bλ2 Ej λ2 Ej and variance

$$\text{Var}\left(\widehat{\lambda}\_{j}^{2}\right) \approx \frac{4\lambda\_{Ej}^{2}\left(1-\lambda\_{Ej}^{2}\right)^{2}}{n}.\tag{7.27}$$

In addition, for <sup>i</sup> 6¼ <sup>j</sup>, the correlation between <sup>b</sup>λ<sup>2</sup> Ej and <sup>b</sup>λ<sup>2</sup> Ei is zero, i.e., Corr bλ2 Ei; <sup>b</sup>λ<sup>2</sup> Ej ¼ 0 (Bilodeau and Brenner 1999; Muirhead 2005).

Equation (7.26) implies that under the restrictions σ<sup>2</sup> <sup>H</sup> <sup>¼</sup> 1 and <sup>σ</sup><sup>2</sup> <sup>I</sup> <sup>¼</sup> 1, the expectation and variance of <sup>R</sup>b<sup>E</sup> <sup>¼</sup> kI bλE<sup>1</sup> are E RbE kIλE<sup>1</sup> and Var RbE k2 <sup>I</sup> <sup>1</sup>λ<sup>2</sup> ð Þ <sup>E</sup><sup>1</sup> 2 <sup>n</sup> respectively. However, obtaining the expectation and variance of <sup>R</sup>b<sup>E</sup> <sup>¼</sup> kI <sup>σ</sup>bHbλE<sup>1</sup> or <sup>R</sup>b<sup>E</sup> <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>1PbbbE<sup>1</sup> q is more difficult, because in both equations there are two estimators: <sup>σ</sup>b<sup>H</sup> and <sup>b</sup>λ<sup>1</sup> in the first one, and <sup>P</sup><sup>b</sup> and <sup>b</sup>b<sup>E</sup><sup>1</sup> in the second one.

#### 7.1.5 Numerical Examples

We compare ESIM efficiency versus LPSI efficiency using a real data set from commercial egg poultry lines obtained from Akbar et al. (1984). The estimated phenotypic (Pb ) and genetic (Cb ) covariance matrices among the rate of lay (RL, number of eggs), age at sexual maturity (SM, days) and egg weight (EW, kg), were <sup>240</sup>:<sup>57</sup> 95:62 2:<sup>07</sup> 2 3 <sup>29</sup>:<sup>86</sup> 17:<sup>90</sup> 4:<sup>13</sup> 2 3

Pb ¼ 95:62 167:20 4:<sup>58</sup> 2:07 4:58 22:80 4 <sup>5</sup> and <sup>C</sup><sup>b</sup> ¼ 17:90 18:56 1:<sup>49</sup> 4:13 1:49 9:<sup>24</sup> 4 5 respectively. The number of genotypes and the vector of economic weights were <sup>n</sup>¼ <sup>3330</sup> and <sup>w</sup><sup>0</sup> ¼ ½ <sup>19</sup>:<sup>54</sup> 3:56 17:<sup>01</sup> respectively, whereas the selection intensity was 10% (kI <sup>¼</sup> 1.755) for both indices.

The estimated LPSI vector of coefficients was <sup>b</sup>b<sup>0</sup> <sup>S</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup>b<sup>1</sup> <sup>C</sup><sup>b</sup> ¼ ½ <sup>1</sup>:<sup>82</sup> 1:38 3:<sup>25</sup> , whereas the estimated selection response, expected genetic gain per trait, accuracy, and heritability of the LPSI were <sup>R</sup>b<sup>S</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>S</sup>Pbbb<sup>S</sup> q ¼ <sup>74</sup>:91, <sup>E</sup>b<sup>0</sup> <sup>S</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> <sup>b</sup>b<sup>0</sup> <sup>S</sup>C<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>S</sup>Pbbb<sup>S</sup> <sup>q</sup> ¼ ½ <sup>2</sup>:<sup>70</sup> 2:20 0:<sup>84</sup> , ffiffiffiffiffiffiffiffiffiffiffiffiffiffi q

$$
\widehat{\rho}\_S = \frac{\sqrt{\mathbf{b}^\circ\_S \widehat{\mathbf{P}} \widehat{\mathbf{b}}\_S}}{\sqrt{\mathbf{w}^\circ \widehat{\mathbf{C}} \mathbf{w}}} = 0.362,\text{ and } \widehat{h}\_S^2 = \frac{\widehat{\mathbf{b}}^\circ\_S \widehat{\mathbf{C}} \widehat{\mathbf{b}}\_S}{\widehat{\mathbf{b}}^\circ\_S \widehat{\mathbf{P}} \widehat{\mathbf{b}}\_S} = 0.143\text{ respectively.}
$$

Note that because in the ESIM context bb<sup>0</sup> <sup>E</sup>bb<sup>E</sup> <sup>¼</sup> 1, the best way of comparing ESIM results versus LPSI results is when the LPSI coefficient vector is normalized, i.e., when the LPSI coefficient vector is equal to <sup>b</sup>b<sup>S</sup> ∗ ¼ <sup>b</sup>bS<sup>=</sup> <sup>b</sup>b<sup>0</sup> <sup>S</sup>bb<sup>S</sup> and then <sup>b</sup>b0∗<sup>0</sup> <sup>S</sup> <sup>b</sup>b<sup>∗</sup> S ¼ 1 ; however, it can be shown that the normalization process only affects the estimated LPSI selection response because in that case, <sup>R</sup>b<sup>S</sup> <sup>¼</sup> <sup>74</sup>:91 is divided by bb0 <sup>S</sup>bbS. For example, for this data set result, bb<sup>0</sup> <sup>S</sup>bb<sup>S</sup> <sup>¼</sup> <sup>15</sup>:76; then, the estimated LPSI selection response using bb<sup>S</sup> ∗ ¼ <sup>b</sup>bS<sup>=</sup> <sup>b</sup>b<sup>0</sup> <sup>S</sup>bb<sup>S</sup> is <sup>R</sup>b<sup>S</sup> <sup>¼</sup> <sup>74</sup>:<sup>91</sup> <sup>15</sup>:<sup>74</sup> <sup>¼</sup> <sup>4</sup>:75, whereas the rest of the estimated LPSI parameters are the same. When 0 < bb<sup>0</sup> <sup>S</sup>bb<sup>S</sup> < 1 and 1 < RbS, the values of Rb<sup>S</sup> increase, but when 1 < bb<sup>0</sup> <sup>S</sup>bbS, the values of Rb<sup>S</sup> decrease, as in the example.

The product bb<sup>0</sup> <sup>S</sup>bb<sup>S</sup> does not affect <sup>b</sup>ρ<sup>S</sup> because it is invariant to scale change. Also, bb0 <sup>S</sup>bb<sup>S</sup> does not affect <sup>b</sup>h<sup>2</sup> <sup>S</sup> and Eb<sup>S</sup> because bb<sup>0</sup> <sup>S</sup>bb<sup>S</sup> appears in the numerator and denominator of both estimated parameters.

In the ESIM, the sign and proportion of the expected genetic gain values for traits RL, SM, and EW should be in accordance with the breeder's interest. For example, if the breeder's interest is that the expected genetic gain per trait for RL should be positive and negative for SM, the sign and proportion of the values of the first eigenvector should be modified using a linear combination of the estimated first eigenvector <sup>b</sup>b<sup>E</sup><sup>1</sup> , i.e., <sup>β</sup><sup>b</sup> <sup>¼</sup> Fbb<sup>E</sup><sup>1</sup> , to achieve expected genetic gain per trait values in RL and SM according to the breeder's interest.

The information needed to obtain the estimated ESIM parameters are matrices <sup>T</sup><sup>b</sup> ¼ <sup>P</sup>b<sup>1</sup>C<sup>b</sup> ¼ <sup>0</sup>:<sup>1102</sup> 0:<sup>0405</sup> 0:<sup>0280</sup> 0:0390 0:<sup>0864</sup> 0:<sup>0184</sup> 0:1833 0:0517 0:<sup>4115</sup> 2 4 3 5andTbTb0 ¼ <sup>0</sup>:<sup>0146</sup> 0:<sup>0073</sup> 0:<sup>0338</sup> 0:0073 0:0093 0:<sup>0041</sup> 0:0338 0:0041 0:<sup>2056</sup> 2 4 3 5. We need to find the eigenvalues and eigenvectors of equation TbTb0 μb<sup>j</sup><sup>I</sup> <sup>b</sup>b<sup>E</sup> <sup>j</sup> <sup>¼</sup>0, where <sup>μ</sup>b<sup>j</sup> <sup>¼</sup>bλ<sup>4</sup> E j , to obtain matrices <sup>V</sup><sup>1</sup> and <sup>L</sup>1/2, which form matrix <sup>T</sup><sup>b</sup> <sup>¼</sup>V1L<sup>1</sup>=<sup>2</sup> V0 2. 2 3

$$\begin{array}{cccc} -0.1701 & 0.6818 & 0.7115\\ 0.0259 & -0.7187 & 0.6948 \end{array}$$

Matrix <sup>V</sup><sup>1</sup> is equal to <sup>V</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:<sup>0259</sup> 0:7187 0:<sup>6948</sup> 0:9851 0:1366 0:1046 4 5, whereas the diag-

onal elements of matrix L are 0.2115, 0.0155, and 0.0025, that is, matrix 0:4599 0 0 2 3

<sup>L</sup><sup>1</sup>=<sup>2</sup> ¼ 0 0:1244 0 0 00:0498 4 <sup>5</sup>. Thus, <sup>μ</sup>b<sup>1</sup> <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> <sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:2115, <sup>b</sup>λ<sup>2</sup> <sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:4599,

and the estimated ESIM accuracy was <sup>b</sup>λ<sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:6782. The estimated ESIM eigenvector of coefficients is the first column of matrix V1, i.e., bb0 <sup>E</sup><sup>1</sup> ¼ ½ <sup>0</sup>:1701 0:0259 0:<sup>9851</sup> , and the estimated ESIM index can be constructed as <sup>b</sup><sup>I</sup> <sup>E</sup> ¼ 0:1701RL <sup>þ</sup> <sup>0</sup>:0259SM <sup>þ</sup> <sup>0</sup>:9851EW.

The estimated ESIM selection response and expected genetic gain per trait were <sup>R</sup>b<sup>E</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>1Pbbb<sup>E</sup><sup>1</sup> q ¼ <sup>9</sup>:54 and <sup>E</sup>b<sup>0</sup> <sup>E</sup> <sup>¼</sup> <sup>1</sup>:<sup>755</sup> bb0 E1 Cb ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>1Pbbb<sup>E</sup><sup>1</sup> <sup>q</sup> ¼ ½ <sup>3</sup>:10 1:61 3:<sup>18</sup>

respectively. Because the estimated LPSI selection response was <sup>R</sup>b<sup>S</sup> <sup>¼</sup> <sup>74</sup>:<sup>91</sup> <sup>15</sup>:<sup>74</sup> <sup>¼</sup> <sup>4</sup>:75, the estimated ESIM selection response was higher than the estimated LPSI response. In addition, the estimated LPSI expected genetic gain per trait wasEb0 <sup>S</sup> <sup>¼</sup> ½ <sup>2</sup>:<sup>70</sup> 2:20 0:<sup>84</sup> . Now, suppose that the breeder's interest is to increase RL and decrease SM; then, Eb<sup>0</sup> <sup>S</sup> is a good result but Eb<sup>0</sup> <sup>E</sup> is wrong.

We can change the sign and proportion of Eb<sup>0</sup> <sup>E</sup> by transforming <sup>b</sup>b<sup>E</sup><sup>1</sup> into <sup>β</sup><sup>b</sup> <sup>¼</sup> Fbb<sup>E</sup><sup>1</sup> using a convenient matrix <sup>F</sup> such as <sup>F</sup> ¼ <sup>900</sup> 0 10 0 01 2 4 3 5. In such a case

$$\begin{array}{llll}\widehat{\mathfrak{P}}' = \widehat{\mathfrak{b}}'\_{E\_1} \mathbf{F} = [1.531 \quad 0.026 \quad 0.981], & \widehat{R}\_E = 1.755 \sqrt{\widehat{\mathfrak{P}}' \widehat{\mathfrak{P}}} = 42.44, \text{ and } \quad \widehat{\mathfrak{E}}'\_E = [1.755 \quad 0.990] \\ [1.755 \quad \frac{\widehat{\mathfrak{P}}' \widehat{\mathfrak{C}}}{\sqrt{\widehat{\mathfrak{P}} \widehat{\mathfrak{P}}}} = [2.990 \quad -1.85 \quad 0.205]. \text{ However, vector } \widehat{\mathfrak{P}}' \text{ was not normalized.} \\ \widehat{\mathfrak{P}}\_{\sim} & \sim \sim \sim \sim \end{array}$$

To normalize βb<sup>0</sup> we need to divide it by βb<sup>0</sup> <sup>β</sup><sup>b</sup> ¼ <sup>3</sup>:314, but <sup>β</sup>b<sup>0</sup> βb should only affect <sup>R</sup>b<sup>E</sup> <sup>¼</sup> <sup>42</sup>:44, which should be divided by 3.314, that is, <sup>R</sup>b<sup>E</sup> <sup>¼</sup> <sup>42</sup>:<sup>44</sup> <sup>3</sup>:<sup>314</sup> <sup>¼</sup> <sup>12</sup>:806. According to the theory of similar matrices (Harville 1997), the estimated maximized ESIM accuracy, <sup>b</sup>λ<sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:6782, should not be affected by matrix <sup>F</sup>.

We can compare ESIM efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of the estimated ESIM accuracy <sup>b</sup>λ<sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:6782 to LPSI accuracy <sup>b</sup>ρ<sup>S</sup> <sup>¼</sup> <sup>0</sup>:362, i.e., <sup>b</sup>λ<sup>E</sup><sup>1</sup> bρS ¼ <sup>0</sup>:<sup>6782</sup> <sup>0</sup>:<sup>362</sup> <sup>¼</sup> <sup>1</sup>:873, or in percentage terms, <sup>b</sup>pE <sup>¼</sup> 100 1ð Þ¼ :<sup>873</sup> <sup>1</sup> <sup>87</sup>:3 (see Chap. 5, Eq. 5.17). According to the latter result, the ESIM is a better predictor of the net genetic merit and its efficiency is 87.3% higher than that of the LPSI for this data set.

Now, we compare ESIM efficiency versus LPSI efficiency using the data set described in Sect. 2.8.1 of Chap. 2. From this data set, we ran five phenotypic selection cycles, each with four traits (T1, T2, T3, and T4), 500 genotypes, and four replicates for each genotype. The economic weights for T1, T2, T3, and T<sup>4</sup> were 1, 1, 1, and 1 respectively. In this case, matrix <sup>F</sup> is an identity matrix of size 4 <sup>4</sup> for all five selection cycles.

Table 7.1 presents the estimated LPSI, the restricted LPSI (RLPSI), and the predetermined proportional gain LPSI (PPG-LPSI) selection response (the latter two for one, two, and three restrictions) for five simulated selection cycles when their vectors of coefficients are normalized. Table 7.1 also presents the estimated ESIM, the RESIM and the PPG-ESIM selection response for one, two, and three restrictions for five simulated selection cycles. The selection intensity was 10% (kI <sup>¼</sup> 1.755) for all five selection cycles. In this subsection, we compare only LPSI results versus ESIM results. The estimated LPSI selection response when the vector of coefficients was not normalized was described in Chap. 2 (Table 2.4). The averages of the estimated LPSI and ESIM selection responses were 4.70 and 6.31 respectively.

Table 7.2 presents the estimated ESIM expected genetic gain per trait, accuracy (bρE), and the values <sup>b</sup>pE <sup>¼</sup> <sup>100</sup> <sup>b</sup>λ<sup>E</sup> <sup>1</sup> , where <sup>b</sup>λ<sup>E</sup> <sup>¼</sup> <sup>b</sup>ρE=bρ<sup>S</sup> is the ratio of <sup>b</sup>ρ<sup>E</sup> to the estimated LPSI accuracy (bρ<sup>S</sup> ), expressed as percentages. Table 7.2 also presents the accuracy of the PPG-ESIM and the estimated ratio (bpPE) of the estimated PPG-ESIM accuracy to the estimated PPG-LPSI accuracy, expressed as percentages, for one, two, and three predetermined restrictions for five simulated selection cycles. In this subsection, we use only the estimated ESIM expected genetic gain per trait and <sup>b</sup>pE <sup>¼</sup> <sup>100</sup> <sup>b</sup>λ<sup>E</sup> <sup>1</sup> to compare ESIM efficiency versus LPSI efficiency.

The estimated LPSI expected genetic gains per trait were presented in Chap. 2, Table 2.4. According to the results shown in Table 2.4, the averages of the estimated

Table 7.1 Estimated linear phenotypic selection index (LPSI), restricted null LPSI (RLPSI), and predetermined proportional gains LPSI (PPG-LPSI) selection responses when their vectors of coefficients are normalized; estimated eigen selection index method (ESIM), restricted null ESIM (RESIM), and predetermined proportional gain ESIM (PPG-ESIM) selection responses for one, two, and three restrictions for five simulated selection cycles


LPSI expected genetic gain per trait T1, T2, T3, and T4 for five simulated selection cycles were 7.26, 3.52, 2.78, and 1.58, whereas according to the results of Table 7.2, the averages of the estimated ESIM expected genetic gains per trait were 5.67, 2.67, 1.81, and 2.9 respectively. This means that the estimated LPSI expected genetic gain for traits T1, T2, and T3 was higher than the estimated ESIM expected genetic gain for those traits.

The average of the <sup>b</sup>pE <sup>¼</sup> <sup>100</sup> <sup>b</sup>λ<sup>E</sup> <sup>1</sup> values was 9.76 for all five selection cycles (Table 7.2). The latter result is not in accordance with the LPSI and ESIM expected genetic gain per trait; however, note that the <sup>b</sup>pE values are associated with the estimated LPSI and ESIM selection responses (Table 7.1), not with the expected genetic gain per trait, because <sup>b</sup>λ<sup>E</sup> <sup>¼</sup> <sup>b</sup>ρ<sup>E</sup> bρS RbE RbS , where Rb<sup>E</sup> and Rb<sup>S</sup> are the estimated ESIM and LPSI selection responses respectively. Thus, the <sup>b</sup>pEvalues indicate that the efficiency of the ESIM and that of the LPSI were very similar because the former was only 9.76% higher than the latter for this data set.

The equality <sup>b</sup>ρ<sup>E</sup> bρS ¼ RbE RbS is true only when the denominators of both estimated correlations are the same, as in the linear selection indices described in Chaps. 2–6.

Table 7.2 Estimated eigen selection index method (ESIM) expected genetic gain per trait, accuracy (ρb<sup>E</sup>), and ratio of <sup>b</sup>ρ<sup>E</sup> to the estimated LPSI (data not presented) accuracy (ρb<sup>S</sup>), expressed in percentage terms, <sup>ρ</sup>b<sup>E</sup> <sup>¼</sup> <sup>100</sup> <sup>b</sup>λ<sup>E</sup> <sup>1</sup> (where <sup>b</sup>λ<sup>E</sup> <sup>¼</sup> <sup>ρ</sup>b<sup>E</sup>=bρS)


Estimated PPG-ESIM accuracy (bρP) and estimated ratio (ρb<sup>P</sup>) of the <sup>b</sup>ρ<sup>P</sup> to the estimated accuracy of the PPG-LPSI (data not presented), expressed in percentages (%), for one, two, and three predetermined restrictions for five simulated selection cycles

Note that <sup>b</sup>ρ<sup>S</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>S</sup>Pbbb<sup>S</sup> q ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw<sup>b</sup> <sup>p</sup> and <sup>b</sup>ρ<sup>E</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>Pbbb<sup>E</sup> q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 <sup>E</sup>Cwb <sup>E</sup> <sup>q</sup> , whereas <sup>R</sup>b<sup>S</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>S</sup>Pbbb<sup>S</sup> q and <sup>R</sup>b<sup>E</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>E</sup>Pbbb<sup>E</sup> q ; this means that if ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 <sup>E</sup>Cwb <sup>E</sup> q 6¼ ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw<sup>b</sup> <sup>p</sup> , <sup>b</sup>ρ<sup>E</sup> bρS 6¼ RbE RbS . For the Akbar et al. (1984) data, <sup>R</sup>b<sup>E</sup> <sup>¼</sup> <sup>9</sup>:54 and <sup>R</sup>b<sup>S</sup> <sup>¼</sup> <sup>4</sup>:75, then <sup>R</sup>b<sup>E</sup> RbS ¼ <sup>2</sup>:0 but bλE1 bρS ¼ <sup>1</sup>:873; that is, <sup>b</sup>ρ<sup>E</sup> bρS RbE RbS , where "" indicates an approximation.

Figure 7.1 presents the frequency distribution of 500 estimated ESIM values for cycle 2 (Fig. 7.1a) and cycle 5 (Fig. 7.1b), obtained from one selection cycle for 500 genotypes and four traits simulated in one environment. Figure 7.1a, b indicates that the frequency distribution of the estimated ESIM values approaches normal distribution.

Fig. 7.1 Frequency distribution of 500 estimated eigen selection index method (ESIM) values for (a) cycle 2 and (b) cycle 5, obtained from one selection cycle for 500 genotypes and four traits simulated in one environment

#### 7.2 The Linear Phenotypic Restricted Eigen Selection Index Method

Similar to the RLPSI (see Chap. 2), the objective of the RESIM is to fix r of <sup>t</sup> (<sup>r</sup> <sup>&</sup>lt; <sup>t</sup>) traits by predicting only the genetic gains of (<sup>t</sup> <sup>r</sup>) of them. Let <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g be the net genetic merit and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y the ESIM index. In Chap. 2, we showed that Cov(I, <sup>g</sup>) ¼ Cb is the covariance between the breeding value vector (g) and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y. Thus, to fix r of t traits, we need r covariances between the linear combinations of g (U<sup>0</sup> <sup>g</sup>) and <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y to be zero, i.e., Cov(I, U<sup>0</sup> <sup>g</sup>) ¼ <sup>U</sup><sup>0</sup> Cb ¼ <sup>0</sup>, where <sup>U</sup><sup>0</sup> is a matrix with 1s and 0s (1 indicates that the trait is restricted and 0 that the trait has no restrictions). In the RESIM, it is possible to solve this problem by maximizing ρ2 HI <sup>¼</sup> ð Þ <sup>w</sup><sup>0</sup> Cb <sup>2</sup> ð Þ <sup>w</sup><sup>0</sup> Cw <sup>b</sup><sup>0</sup> ð Þ Pb with respect to vectors <sup>b</sup> and <sup>w</sup> under the restrictions U0 Cb ¼ <sup>0</sup>, <sup>b</sup><sup>0</sup> <sup>b</sup> ¼ 1, <sup>w</sup><sup>0</sup> Cw ¼ 1, and <sup>b</sup><sup>0</sup> Pb ¼ 1, where <sup>w</sup><sup>0</sup> Cw is the variance of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g and b<sup>0</sup> Pb is the variance of <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y. Also, the RESIM problem can be solved by maximizing <sup>b</sup><sup>0</sup> Cb ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> (Eq. 7.12) with respect to vectors b only under the restrictions U0 Cb ¼ <sup>0</sup> and <sup>b</sup><sup>0</sup> <sup>b</sup> ¼ 1, as we did to obtain Eq. (7.13). Both approaches give the same result, but it is easier to work with the second approach than with the first one.

#### 7.2.1 The RESIM Parameters

To obtain the RESIM vector of coefficients that maximizes the RESIM selection response and the expected genetic gain per trait, we need to maximize the function

$$f(\mathbf{b}, \mathbf{v}') = \frac{\mathbf{b}' \mathbf{C} \mathbf{b}}{\sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}}} - \mathbf{v}' \mathbf{U}' \mathbf{C} \mathbf{b} \tag{7.28a}$$

with respect to b and v<sup>0</sup> , where <sup>v</sup><sup>0</sup> <sup>¼</sup> ½ <sup>v</sup><sup>1</sup> <sup>v</sup><sup>2</sup> -- vr<sup>1</sup> is a vector of Lagrange multipliers. The derivatives of Eq. (7.28a) with respect to b and v<sup>0</sup> can be written as

$$2\left(\mathbf{b}'\mathbf{P}\mathbf{b}\right)^{1/2}\mathbf{C}\mathbf{b} - \left(\mathbf{b}'\mathbf{P}\mathbf{b}\right)^{-1/2}(\mathbf{b}'\mathbf{C}\mathbf{b})\mathbf{P}\mathbf{b} - \mathbf{C}\mathbf{U}\mathbf{v} = \mathbf{0} \tag{7.28b}$$

and

$$\mathbf{U}^{\prime}\mathbf{C}\mathbf{b}=\mathbf{0},\tag{7.29}$$

respectively, where Eq. (7.29) denotes the restriction imposed for maximizing Eq. (7.28a). Using algebraic methods on Eq. (7.28b) similar to those used to obtain Eqs. (7.10) and (7.13), we get

$$\left(\mathbf{K}\mathbf{P}^{-1}\mathbf{C} - h\_{I\_{R}}^{2}\mathbf{I}\_{t}\right)\mathbf{b}\_{R} = \mathbf{0},\tag{7.30}$$

where <sup>K</sup> <sup>¼</sup> [I<sup>t</sup> <sup>Q</sup>R],I<sup>t</sup> is an identity matrix of size <sup>t</sup> <sup>t</sup>, <sup>Q</sup><sup>R</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> CU(U0 CP<sup>1</sup> CU) 1 U0 C, and h<sup>2</sup> IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>Cb<sup>R</sup> b0 <sup>R</sup>Pb<sup>R</sup> is the maximized RESIM heritability obtained under the restriction U<sup>0</sup> Cb ¼ <sup>0</sup>; <sup>h</sup><sup>2</sup> IR is also the square of the maximized correlation between the net genetic merit and IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y, that is, h<sup>2</sup> IR <sup>¼</sup> <sup>λ</sup><sup>2</sup> <sup>R</sup>. This means that Eq. (7.30) can be written as

$$(\mathbf{K}\mathbf{P}^{-1}\mathbf{C} - \lambda\_R^2 \mathbf{I}\_l)\mathbf{b}\_R = \mathbf{0}.\tag{7.31}$$

Thus, the optimized RESIM index is <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> <sup>R</sup>y. The only difference between Eqs. (7.31) and (7.13) is matrix K. Equation (7.31) was obtained by Cerón-Rojas et al. (2008) by maximizing ρ<sup>2</sup> HI (Eq. 7.1) with respect to vectors b and w under the restriction U<sup>0</sup> Cb ¼ <sup>0</sup>, <sup>b</sup><sup>0</sup> <sup>b</sup> ¼ 1, <sup>w</sup><sup>0</sup> Cw ¼ 1 and <sup>b</sup><sup>0</sup> Pb ¼ 1 in a similar manner to the canonical correlation theory. The RESIM expected genetic gain per trait uses the first eigenvector (bR) of matrix KP<sup>1</sup> C, whereas the RESIM selection response uses b<sup>R</sup> and the first eigenvalue (λ<sup>2</sup> <sup>R</sup>) of matrix KP<sup>1</sup> <sup>C</sup>. When <sup>U</sup><sup>0</sup> is a null matrix, <sup>b</sup><sup>R</sup> <sup>¼</sup> <sup>b</sup><sup>E</sup> (the vector of the ESIM coefficients); thus, the RESIM is more general than the ESIM and includes the ESIM as a particular case.

In the RESIM context, vector w can be obtained (Cerón-Rojas et al. 2008) as

$$\mathbf{w}\_{R} = \mathbf{C}^{-1}[\lambda\_{R}\mathbf{P}\mathbf{b}\_{R} + \mathbf{\Psi}\mathbf{v}],\tag{7.32}$$

where λ<sup>R</sup> and b<sup>R</sup> are the square roots of the first eigenvalue (λ<sup>2</sup> <sup>R</sup>) and the first eigenvector of matrix KP<sup>1</sup> <sup>C</sup> respectively; <sup>Ψ</sup> ¼ CU and <sup>v</sup> ¼ <sup>λ</sup><sup>1</sup> <sup>R</sup> Ψ<sup>0</sup> P<sup>1</sup> <sup>Ψ</sup> <sup>1</sup> Ψ0 P<sup>1</sup> CbR. Let HR <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup>g be the net genetic merit in the RESIM context; then, because the correlation between IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup><sup>y</sup> and HR <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup>g is not affected by scale change, λ<sup>R</sup> and λ<sup>1</sup> <sup>R</sup> can be considered proportional constants and then Ψv can be written as <sup>Ψ</sup><sup>v</sup> ¼ Ψ Ψ<sup>0</sup> P<sup>1</sup> <sup>Ψ</sup> <sup>1</sup> Ψ0 P<sup>1</sup> Cb<sup>R</sup> <sup>¼</sup> <sup>Q</sup><sup>0</sup> <sup>R</sup>CbR, where Q<sup>0</sup> <sup>R</sup> is the transpose of matrix Q<sup>R</sup> described in Eq. (7.30). Thus, another way of writing Eq. (7.32) is

$$\mathbf{w}\_{\mathcal{R}} = \mathbf{C}^{-1} \left[ \mathbf{P} + \mathbf{Q}\_{\mathcal{R}}^{\prime} \mathbf{C} \right] \mathbf{b}\_{\mathcal{R}}.\tag{7.33}$$

By Eq. (7.33) and the restriction b<sup>0</sup> <sup>Ψ</sup> <sup>¼</sup> <sup>0</sup>, the covariance between IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y and HR <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup>g (σHRIR ) can be written as

$$
\sigma\_{R\_R l\_R} = \mathbf{w}\_R' \mathbf{C} \mathbf{b}\_R = \mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R + \mathbf{b}\_R' \mathbf{Q}\_R' \mathbf{C} \mathbf{b}\_R = \mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R,\tag{7.34}
$$

where b<sup>0</sup> RQ<sup>0</sup> <sup>R</sup>Cb<sup>R</sup> <sup>¼</sup> 0 according to the restriction <sup>b</sup><sup>0</sup> <sup>Ψ</sup> ¼ <sup>0</sup>. Equation (7.34) indicates that the covariance between IR and HR (σHRIR ) is equal to the variance of IR (σ<sup>2</sup> IR ¼ b0 <sup>R</sup>PbR).

The maximized correlation between IR and HR (or RESIM accuracy) can be written as

$$
\rho\_{H\_R I\_R} = \frac{\sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}}{\sqrt{\mathbf{w}\_R' \mathbf{C} \mathbf{w}\_R}},\tag{7.35}
$$

where w<sup>0</sup> <sup>R</sup>Cw<sup>R</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> HR is the variance of HR, <sup>w</sup><sup>R</sup> <sup>¼</sup> <sup>C</sup><sup>1</sup> <sup>P</sup> <sup>þ</sup> <sup>Q</sup><sup>0</sup> <sup>R</sup><sup>C</sup> bR, Q<sup>0</sup> <sup>R</sup> ¼ Ψ Ψ0 <sup>P</sup><sup>1</sup><sup>Ψ</sup> <sup>1</sup> Ψ0 <sup>P</sup>1, and <sup>Ψ</sup> ¼ CU. When <sup>U</sup><sup>0</sup> is a null matrix, <sup>w</sup><sup>0</sup> <sup>R</sup>Cw<sup>R</sup> <sup>¼</sup> b0 EPC<sup>1</sup> Pb<sup>E</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>CwE, the variance of HE, and σ<sup>2</sup> IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>Pb<sup>R</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>E</sup>Pb<sup>E</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> IE , the variance of IE. Hereafter, to simplify the notation, we write Eq. (7.35) as ρ<sup>R</sup> or λR.

The maximized selection response (RR) and expected genetic gain per trait (ER) of the RESIM can be written as

$$R\_R = k\_I \sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R} \tag{7.36}$$

and

$$\mathbf{E}\_R = k\_I \frac{\mathbf{C} \mathbf{b}\_R}{\sqrt{\mathbf{b}\_R' \mathbf{P} \mathbf{b}\_R}},\tag{7.37}$$

respectively, where ffiffiffiffiffiffiffiffiffiffiffiffiffi b0 <sup>R</sup>Pb<sup>R</sup> q <sup>¼</sup> <sup>σ</sup>IR is the standard deviation of the variance of IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y. If vector <sup>b</sup><sup>R</sup> is transformed as <sup>β</sup><sup>R</sup> <sup>¼</sup> FbR, where matrix <sup>F</sup> was defined earlier, vector <sup>b</sup><sup>R</sup> should be changed by <sup>β</sup><sup>R</sup> in Eqs. (7.36) and (7.37), and inIR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y.

Equation (7.36) can also be written as RR <sup>¼</sup> kIσHR <sup>λ</sup>R, where <sup>σ</sup>HR <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 RPC<sup>1</sup> Pb<sup>R</sup> <sup>þ</sup> <sup>b</sup><sup>0</sup> RPC<sup>1</sup> Q0 <sup>R</sup>Cb<sup>R</sup> q is the standard deviation of the variance of HR, and <sup>λ</sup><sup>R</sup> <sup>¼</sup> <sup>ρ</sup>HRIR is the first canonical correlation between HR <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup><sup>g</sup> and IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y. When <sup>σ</sup>HR <sup>¼</sup> 1, <sup>λ</sup><sup>R</sup> is the covariance between HR <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup><sup>g</sup> and IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y, and then Eq. (7.36) can be written as RR <sup>¼</sup> kIλR. This last result was presented by Cerón-Rojas et al. (2008) in their original paper.

The ratio of the index accuracies and the VPE are also valid in the RESIM context. In Eq. (7.34) we showed that the covariance between IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup><sup>y</sup> and HR <sup>¼</sup> w0 <sup>R</sup><sup>g</sup> (σHRIR) is equal to the variance of IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y (σ<sup>2</sup> IR ). This means that the VPE of the RESIM can be written as

$$E\left[\left(H\_R - I\_R\right)^2\right] = \sigma\_{H\_R}^2 + \sigma\_{I\_R}^2 - 2\sigma\_{H\_R I\_R} = \sigma\_{H\_R}^2 - \sigma\_{I\_R}^2 = \left(1 - \rho\_R^2\right)\sigma\_{H\_R}^2. \tag{7.38}$$

Statistical properties associated with the ESIM and described in Sect. 7.1.2 are also valid for the RESIM.

#### 7.2.2 Estimating the RESIM Parameters

We can estimate the RESIM parameters in a similar manner to the ESIM parameters in Sect. 7.1.4. With matrices <sup>C</sup><sup>b</sup> and <sup>P</sup>b, we constructed matrix <sup>S</sup>b<sup>R</sup> <sup>¼</sup> <sup>K</sup>bPb<sup>1</sup> Cb and equation

$$\left(\widehat{\mathbf{S}}\_{R}\widehat{\mathbf{S}}\_{R}^{\prime}-\widehat{\boldsymbol{\mu}}\_{R\circ}\mathbf{I}\_{\iota}\right)\widehat{\mathbf{b}}\_{R\circ}=\mathbf{0},\tag{7.39}$$

where <sup>μ</sup>bRj <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> R j , <sup>j</sup><sup>¼</sup> 1, 2, ..., <sup>t</sup>. The estimated RESIM index (IR <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>R</sup>y) isb<sup>I</sup> <sup>R</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>1</sup> y and the estimator of the maximized RESIM selection response and its expected genetic gain per trait can be denoted as <sup>R</sup>b<sup>R</sup> <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pbbb<sup>R</sup><sup>1</sup> q and <sup>E</sup>b<sup>R</sup> <sup>¼</sup> kI <sup>C</sup>bbb<sup>R</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pbbb<sup>R</sup><sup>1</sup> q

respectively, whereas the estimator of the maximized RESIM accuracy is bλ<sup>R</sup><sup>1</sup> .

#### 7.2.3 Numerical Examples

We compare the RLPSI results with those of the RESIM using the Akbar et al. (1984) data described in Sect. 7.1.5. We restrict the trait RL (number of eggs) in both indices. In Chap. 3, Sect. 3.1.3, we indicated how to construct matrix U<sup>0</sup> and, in Sect. 3.1.4 of the same chapter, we described how to obtain matrix <sup>K</sup><sup>b</sup> ¼ <sup>I</sup><sup>t</sup> <sup>Q</sup><sup>b</sup> for one and two restrictions. Matrix Kb is the same for the RLPSI and the RESIM. Thus, in this subsection we omit the steps needed to construct matrices U<sup>0</sup> and Kb.

First, we estimate the RLPSI parameters. Assume a selection intensity of 10% (kI <sup>¼</sup> 1.755) and a vector of economic weights <sup>w</sup><sup>0</sup> <sup>¼</sup> ½ <sup>19</sup>:<sup>54</sup> 3:56 17:<sup>01</sup> . The estimated RLPSI vector of coefficients for one restriction was <sup>b</sup>b<sup>0</sup> ¼ ½ <sup>0</sup>:<sup>29</sup> 0:84 5:<sup>78</sup> , and the estimated selection response, expected genetic gain per trait, accuracy, and heritability of the RLPSI were <sup>R</sup><sup>b</sup> ¼ <sup>1</sup>:<sup>755</sup> ffiffiffiffiffiffiffiffiffiffi bb0 Pbbb q ¼ <sup>53</sup>:01, <sup>E</sup>b<sup>0</sup> ¼ <sup>1</sup>:<sup>755</sup> <sup>b</sup>b<sup>0</sup> Cb ffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>bb<sup>b</sup> <sup>p</sup> <sup>¼</sup> ½ <sup>0</sup> 0:71 2:<sup>96</sup> , <sup>b</sup><sup>ρ</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>bb<sup>b</sup> <sup>p</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw<sup>b</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:26, and <sup>b</sup>h<sup>2</sup> <sup>¼</sup> bb0 Cbbb bb0 <sup>P</sup>bb<sup>b</sup> <sup>¼</sup> <sup>0</sup>:33 respectively. In this case, <sup>b</sup>b<sup>0</sup> <sup>b</sup><sup>b</sup> ¼ <sup>34</sup>:25; then, the estimated RLPSI selection response using the normalized RLPSI vector of coefficients was <sup>R</sup><sup>b</sup> ¼ <sup>53</sup>:<sup>01</sup> <sup>34</sup>:<sup>25</sup> <sup>¼</sup> <sup>1</sup>:55, and the rest of the estimated RLPSI parameters were the same. In the RESIM, matrix <sup>F</sup> was an identity matrix of size 3 3; that is, we did not use matrix F to transform the RESIM vector of coefficients. In Sect. 7.1.5 we obtained matrix <sup>P</sup>b<sup>1</sup> <sup>C</sup><sup>b</sup> ¼ <sup>0</sup>:<sup>1102</sup> 0:<sup>0405</sup> 0:<sup>0280</sup> 0:0390 0:<sup>0864</sup> 0:<sup>0184</sup> 0:1833 0:0517 0:<sup>4115</sup> 2 4 3 5, and we have indi-

cated that matrix Kb is the same for the RLPSI and the RESIM. In the RESIM, we need matrix <sup>S</sup>b<sup>R</sup> <sup>¼</sup> <sup>K</sup>bPb<sup>1</sup> <sup>C</sup><sup>b</sup> to solve equation Sb<sup>R</sup> bS<sup>0</sup> <sup>R</sup> <sup>μ</sup>bRjI<sup>t</sup> <sup>b</sup>b<sup>R</sup> <sup>j</sup> <sup>¼</sup> <sup>0</sup>, where <sup>μ</sup>bRj <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> R j , whence we shall obtain the eigenvalues and eigenvectors that form matrices L<sup>1</sup>=<sup>2</sup> <sup>R</sup> , <sup>V</sup>R1, and <sup>S</sup>b<sup>R</sup> <sup>¼</sup> <sup>V</sup><sup>R</sup>1L<sup>1</sup>=<sup>2</sup> <sup>R</sup> V<sup>0</sup> R2.

For one null restriction, matrix <sup>S</sup>b<sup>R</sup> <sup>¼</sup> <sup>K</sup>bPb<sup>1</sup> <sup>C</sup><sup>b</sup> ¼ 0 0:0285 0:0232 0 0:<sup>0620</sup> 0:<sup>0365</sup> <sup>0</sup> 0:0630 0:<sup>3263</sup> 2 4 3 5.

This means that <sup>S</sup>b<sup>R</sup> reflects the trait restrictions imposed on the covariance between the RESIM and the vector of genotypic values; thus, if r traits are restricted, r columns of Sb<sup>R</sup> are equal to zero. Matrix Sb<sup>R</sup> bS<sup>0</sup> <sup>R</sup> ¼ 0:0013 0:0009 0:0058 <sup>0</sup>:0009 0:<sup>0052</sup> 0:<sup>0158</sup> <sup>0</sup>:<sup>0058</sup> 0:0158 0:<sup>1104</sup> 2 4 3 <sup>5</sup> and <sup>V</sup><sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:0500 0:<sup>5216</sup> 0:<sup>8517</sup> 0:1446 0:8476 0:<sup>5106</sup> 0:9882 0:0976 0:1178 2 4 3 5, whereas the <sup>μ</sup>bRj <sup>¼</sup>bλ<sup>4</sup> <sup>R</sup> <sup>j</sup> values were 0.1130, 0.0039, and 0.0, whence L<sup>1</sup>=<sup>2</sup> <sup>R</sup> ¼ 0:3362 0 0 0 0:0626 0 0 00:0 2 4 3 <sup>5</sup>. Thus, <sup>μ</sup>b<sup>R</sup><sup>1</sup> <sup>¼</sup>bλ<sup>4</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup>0:1130, <sup>b</sup>λ<sup>2</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup>0:3362, and the

estimated RESIM accuracy was <sup>b</sup>λ<sup>E</sup><sup>1</sup> <sup>¼</sup>0:5798. The estimated RESIM eigenvector, index, the selection response, and expected genetic gain per trait were bb<sup>0</sup> <sup>R</sup><sup>1</sup> ¼½ <sup>0</sup>:<sup>0500</sup> 0:1446 0:<sup>9882</sup> , <sup>b</sup><sup>I</sup> <sup>R</sup> <sup>¼</sup>0:0500RL0:1446SMþ0:9882EW,

$$
\widehat{R}\_R = 1.755 \sqrt{\widehat{\mathbf{b}}\_{R\_1}^{\prime} \widehat{\mathbf{P}} \widehat{\mathbf{b}}\_{R\_1}} = 9.06, \quad \text{and} \qquad \widehat{\mathbf{E}}\_R^{\prime} = 1.755 \frac{\widehat{\mathbf{b}}\_{R\_1}^{\prime} \widehat{\mathbf{C}}}{\sqrt{\widehat{\mathbf{b}}\_{R\_1}^{\prime} \widehat{\mathbf{P}} \widehat{\mathbf{b}}\_{R\_1}}} = \begin{bmatrix} 0 & -0.72 & 2.96 \end{bmatrix}.
$$

respectively.

The estimated RLPSI selection response was <sup>R</sup><sup>b</sup> ¼ <sup>53</sup>:<sup>01</sup> <sup>34</sup>:<sup>25</sup> <sup>¼</sup> <sup>1</sup>:55 ; thus, the estimated RESIM selection response was higher than the estimated RLPSI response. In addition, the estimated RLPSI expected genetic gain per trait was <sup>E</sup>b<sup>0</sup> ¼ ½ <sup>0</sup> 0:71 2:<sup>96</sup> , which is the same as the estimated RESIM expected genetic gain per trait.

We can compare RESIM efficiency versus RLPSI efficiency to predict the net genetic merit using the ratio of the estimated RESIM accuracy <sup>b</sup>λ<sup>E</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:5798 to the RLPSI accuracy <sup>b</sup><sup>ρ</sup> ¼ <sup>0</sup>:26, i.e., <sup>b</sup>λ<sup>R</sup><sup>1</sup> bρS ¼ <sup>0</sup>:<sup>5798</sup> <sup>0</sup>:<sup>26</sup> <sup>¼</sup> <sup>2</sup>:23, or in percentage terms, <sup>b</sup>pE <sup>¼</sup> 100 2ð Þ¼ :<sup>23</sup> <sup>1</sup> 123 (see Chap. 5, Eq. 5.17). That is, the RESIM is a better predictor of the net genetic merit and its efficiency was 123% higher than the RLPSI efficiency for this data set.

Now, we compare RESIM efficiency versus RLPSI efficiency using the simulated data set described in Sect. 2.8.1 of Chap. 2 for five phenotypic selection cycles, each with four traits (T1, T2, T3, and T4), 500 genotypes, and four replicates for each genotype. The economic weights for <sup>T</sup>1, <sup>T</sup>2, <sup>T</sup>3, and <sup>T</sup><sup>4</sup> were 1, 1, 1, and 1 respectively. For this data set, matrix <sup>F</sup> was equal to an identity matrix of size 4 4 for all five selection cycles.

The first and second parts of columns 3, 4, and 5 of Table 7.1 present the estimated RLPSI and RESIM selection responses respectively for one, two, and three null restrictions for five simulated selection cycles, where the selection intensity was 10% (kI <sup>¼</sup> 1.755) for all five selection cycles. The averages of the estimated RLPSI selection response for each null restriction were 4.43, 4.30, and 4.92, whereas the averages of the estimated RESIM selection response were 4.54, 4.42, and 4.38 respectively. These results indicate that the estimated RLPSI selection response was greater than the estimated RESIM selection response only for three null restrictions.

The first part of Table 7.3 presents the estimated RESIM expected genetic gain per trait for one, two, and three restrictions for five simulated selection cycles. The estimated RLPSI expected genetic gains per trait for one, two, and three restrictions are given in Chap. 3 (Table 3.3). According to the results shown in Table 3.3 (Chap. 3), the averages of the estimated RLPSI expected genetic gains per trait for five simulated selection cycles were 2.52, 2.25, and 2.26 for one restriction; 2.84 and 2.65 for two restrictions; and 3.90 for three restrictions. According to the results shown in Table 7.3, the averages of the estimated RESIM expected genetic gains per trait for five simulated selection cycles were 0.43, 0.75, and 3.90 for one restriction; 0.59 and 3.89 for two restrictions; and 3.90 for three restrictions. This means that the RESIM and RLPSI were the same only for three restrictions, whereas for one and two restrictions, the average of the estimated RESIM expected


Table 7.3 Estimated RESIM and PPG-ESIM expected genetic gain per trait for one, two, and three restrictions for five simulated selection cycles

The selection intensity was 10% (kI <sup>¼</sup> 1.755) and the vectors of the PPG for each predetermined restriction were d<sup>0</sup> <sup>1</sup> <sup>¼</sup> 7, <sup>d</sup><sup>0</sup> <sup>2</sup> <sup>¼</sup> ½ <sup>7</sup> <sup>3</sup> and <sup>d</sup><sup>0</sup> <sup>3</sup> <sup>¼</sup> ½ <sup>7</sup> 3 5 respectively

genetic gains per trait was higher than that of the estimated RLPSI expected genetic gains per trait only for trait 4.

Figure 7.2 presents the estimated accuracy of the RLPSI and the RESIM for one, two, and three null restrictions for five simulated selection cycles. In all five selection cycles, the estimated RESIM accuracy was greater than the RLPSI accuracy. This means that the RESIM is a better predictor of the net genetic merit than the RLPSI. Additional results associated with the frequency distribution of the estimated RESIM values are presented in Fig. 7.3. Figure 7.3a presents the frequency distribution of the estimated RESIM values with one null restriction for cycle 2, whereas Fig. 7.3b presents the frequency distribution of the estimated RESIM values with two null restrictions for cycle 5; both figures indicate that the estimated RESIM values approach normal distribution.

Finally, in Chap. 10 we present the results of comparing the ESIM with the LPSI and the RESIM with the RLPSI for many selection cycles. Such results are similar to those obtained in this chapter.

Fig. 7.2 Estimated correlation values between the restricted linear phenotypic selection index (RLPSI) and the net genetic merit (<sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g); estimated correlation values between the restricted eigen selection index method (RESIM) and H for one, two and three null restrictions for four traits and 500 genotypes in one environment simulated for five selection cycles

Fig. 7.3 Frequency distribution of 500 estimated RESIM values for (a) cycle 2 and (b) cycle 5, obtained from one selection cycle for 500 genotypes and four traits simulated in one environment

#### 7.3 The Linear Phenotypic Predetermined Proportional Gain Eigen Selection Index Method

In a similar manner to the PPG-LPSI (see Chap. 3), in the PPG-ESIM the breeder pre-sets optimal levels (predetermined proportional gains) on certain traits before the selection is carried out. Let <sup>d</sup><sup>0</sup> <sup>¼</sup> <sup>d</sup><sup>1</sup> <sup>d</sup><sup>2</sup> -- dr ½ be the vector of the PPGs (predetermined proportional gains) imposed by the breeder on r traits and assume that μ<sup>q</sup> is the population mean of the qth trait before selection. The objective of the PPG-ESIM is to change μ<sup>q</sup> to μ<sup>q</sup> + dq, where dq is a predetermined change in μ<sup>q</sup> (in the RESIM, dq <sup>¼</sup> 0, <sup>q</sup> <sup>¼</sup> 1, 2, ---, r, where r is the number of PPGs). That is, the PPG-ESIM attempts to make some traits change their expected genetic gain values based on a predetermined level, whereas the rest of the traits remain without restrictions.

The simplest way to solve the foregoing problem is by maximizing the PPG-ESIM heritability under the restriction D<sup>0</sup> U0 Cb ¼ <sup>0</sup>, where

<sup>D</sup><sup>0</sup> ¼ dr <sup>0</sup> --- <sup>0</sup> d<sup>1</sup> <sup>0</sup> dr --- <sup>0</sup> d<sup>2</sup> ⋮⋮⋱⋮ ⋮ 0 0 dr dr<sup>1</sup> 2 6 6 4 3 7 7 5 (see Chap. <sup>3</sup> for details) is a matrix (<sup>r</sup> 1) <sup>r</sup>,


The Mallard (1972) matrix of predetermined restrictions can be written as M<sup>0</sup> ¼ <sup>D</sup><sup>0</sup> Ψ0 , where <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> C and U<sup>0</sup> is the Kempthorne and Nordskog (1959) matrix of restrictions of 1s and 0s (1 indicates that the trait is restricted, i.e., dq <sup>¼</sup> 0, and 0 that the trait has no restrictions).

To find the PPG-ESIM vector of coefficients that maximizes the PPG-ESIM selection response and expected genetic gain per trait, we can maximize ρ<sup>2</sup> HI ¼

<sup>w</sup><sup>0</sup> ð Þ Cb <sup>2</sup> ð Þ <sup>w</sup><sup>0</sup> Cw <sup>b</sup><sup>0</sup> ð Þ Pb with respect to vectors <sup>b</sup> and <sup>w</sup> under the restrictions <sup>M</sup><sup>0</sup> <sup>b</sup> ¼ <sup>0</sup>,

b0 <sup>b</sup> ¼ 1, <sup>w</sup><sup>0</sup> Cw ¼ 1, and <sup>b</sup><sup>0</sup> Pb ¼ 1, where <sup>w</sup><sup>0</sup> Cw is the variance of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g and b<sup>0</sup> Pb is the variance of <sup>I</sup> ¼ <sup>b</sup><sup>0</sup> y, as did Cerón-Rojas et al. (2016) according to the canonical correlation theory, or we can solve this problem by maximizing <sup>b</sup><sup>0</sup> Cb ffiffiffiffiffiffiffi b0 Pb <sup>p</sup> (Eq. 7.12) only with respect to vectors b under the restriction M<sup>0</sup> <sup>b</sup> ¼ <sup>0</sup> and <sup>b</sup><sup>0</sup> <sup>b</sup> ¼ 1, as we did to obtain the RESIM vector of coefficients. Both approaches give the same result, but we use the latter approach because it is easier to work with.

#### 7.3.1 The PPG-ESIM Parameters

To obtain the PPG-ESIM vector of coefficients, we need to maximize the function

$$f(\mathbf{b}, \mathbf{v}') = \frac{\mathbf{b}' \mathbf{C} \mathbf{b}}{\sqrt{\mathbf{b}' \mathbf{P} \mathbf{b}}} - \mathbf{v}' \mathbf{M}' \mathbf{b} \tag{7.40}$$

with respect to vectors b and v<sup>0</sup> , where <sup>v</sup><sup>0</sup> <sup>¼</sup> ½ <sup>v</sup><sup>1</sup> <sup>v</sup><sup>2</sup> -- vr<sup>1</sup> is a vector of Lagrange multipliers. The derivatives of Eq. (7.40) with respect to b and v<sup>0</sup> were:

$$2\left(\mathbf{b'Pb}\right)^{1/2}\mathbf{Cb} - \left(\mathbf{b'Pb}\right)^{-1/2}(\mathbf{b'Cb})\mathbf{Pb} - \mathbf{Mv} = \mathbf{0} \tag{7.41}$$

and

$$\mathbf{M}'\mathbf{b}=\mathbf{0},\tag{7.42}$$

respectively, where Eq. (7.42) denotes the restriction imposed for maximizing Eq. (7.40). By using algebraic methods on Eq. (7.41) similar to those used to obtain Eq. (7.10) we get

$$\left(\mathbf{K}\_P \mathbf{P}^{-1} \mathbf{C} - \lambda\_P^2 \mathbf{I}\_t\right) \mathbf{b}\_P = \mathbf{0},\tag{7.43}$$

where <sup>K</sup><sup>P</sup> <sup>¼</sup> [I<sup>t</sup> <sup>Q</sup>P], <sup>Q</sup><sup>P</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> ΨD(D0 Ψ0 P<sup>1</sup> ΨD) 1 D0 Ψ0 , <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> C, I<sup>t</sup> is an identity matrix <sup>t</sup> <sup>t</sup>, <sup>λ</sup><sup>2</sup> <sup>P</sup> <sup>¼</sup> <sup>h</sup><sup>2</sup> IP , and <sup>b</sup><sup>P</sup> are the first eigenvalue and the first eigenvector of matrix KPP<sup>1</sup> C respectively. Note that h<sup>2</sup> IP is PPG-ESIM heritability and λ<sup>P</sup> is the maximum correlation between <sup>I</sup><sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup><sup>y</sup> and <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup>. When <sup>D</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> , <sup>b</sup><sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>R</sup> (the vector of coefficients of the RESIM), and when <sup>U</sup><sup>0</sup> is a null matrix, <sup>b</sup><sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>E</sup> (the vector of coefficients of the ESIM). That is, the PPG-ESIM is more general than the RESIM and the ESIM and includes the latter two indices as particular cases. Matrices <sup>K</sup><sup>P</sup> <sup>¼</sup> [I<sup>t</sup> <sup>Q</sup>P] and <sup>Q</sup><sup>P</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> ΨD(D0 Ψ0 P<sup>1</sup> ΨD) 1 D0 Ψ<sup>0</sup> are the same as those obtained in the PPG-LPSI (see Chap. 3). Also, vector b<sup>P</sup> can be transformed as <sup>β</sup><sup>P</sup> <sup>¼</sup> FbP; matrix <sup>F</sup> was defined earlier.

Let <sup>S</sup><sup>P</sup> <sup>¼</sup> <sup>Ψ</sup><sup>0</sup> P<sup>1</sup> Ψ; then, under the assumption D<sup>0</sup> <sup>d</sup> ¼ <sup>0</sup>, it is possible to show that D D<sup>0</sup> ð Þ <sup>S</sup>P<sup>D</sup> <sup>1</sup> <sup>D</sup><sup>0</sup> ¼ <sup>S</sup><sup>1</sup> <sup>P</sup> <sup>S</sup><sup>1</sup> <sup>P</sup> d d<sup>0</sup> S<sup>1</sup> <sup>P</sup> <sup>d</sup> <sup>1</sup> d0 S<sup>1</sup> <sup>P</sup> (see Chap. 3), whence by substituting S<sup>1</sup> <sup>P</sup> <sup>S</sup><sup>1</sup> <sup>P</sup> d d<sup>0</sup> S<sup>1</sup> <sup>P</sup> <sup>d</sup> <sup>1</sup> d0 S<sup>1</sup> <sup>P</sup> for D(D<sup>0</sup> SPD) 1 <sup>D</sup><sup>0</sup> in matrix <sup>Q</sup><sup>P</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> ΨD(D0 Ψ0 P<sup>1</sup> ΨD) 1 D0 Ψ0 , matrix KPP<sup>1</sup> C can be written as

$$\mathbf{K}\_P \mathbf{P}^{-1} \mathbf{C} = \left[ \mathbf{I}\_t - \mathbf{P}^{-1} \mathbf{V} \mathbf{S}^{-1} \boldsymbol{\Psi}^\prime \right] \mathbf{P}^{-1} \mathbf{C} + \mathbf{A}\_P,\tag{7.44}$$

where <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> <sup>C</sup>, <sup>A</sup><sup>P</sup> <sup>¼</sup> δα<sup>0</sup> , <sup>δ</sup> ¼ <sup>P</sup><sup>1</sup> Ψ(Ψ0 P<sup>1</sup> Ψ) 1 <sup>d</sup>, andα<sup>0</sup> ¼ <sup>d</sup><sup>0</sup> S1Ψ0 P1C d0 <sup>S</sup>1<sup>d</sup> . When A<sup>P</sup> is a null matrix, KPP<sup>1</sup> <sup>C</sup> ¼ KP<sup>1</sup> C (matrix of the RESIM), and if U<sup>0</sup> is a null matrix, KPP<sup>1</sup> <sup>C</sup> ¼ <sup>P</sup><sup>1</sup> C (matrix of the ESIM), this means that Eq. (7.44) is a mathematical equivalent form of matrix KPP<sup>1</sup> C and that Eq. (7.44) does not require matrix D<sup>0</sup> . The easiest way to obtain <sup>b</sup><sup>P</sup> and <sup>λ</sup><sup>P</sup> is to use matrix [I<sup>t</sup> <sup>P</sup><sup>1</sup> ΨS<sup>1</sup> Ψ0 ]P<sup>1</sup> C + A<sup>P</sup> in Eq. (7.43) instead of matrix KPP<sup>1</sup> C.

In the PPG-ESIM context, vector w can be obtained as

$$\mathbf{w}\_P = \mathbf{C}^{-1} [\mathbb{A}\_P \mathbf{P} \mathbf{b}\_P + \mathbf{M} \mathbf{v}\_P],\tag{7.45}$$

whence <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> can be written as HP <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup>g. In Eq. (7.45), λ<sup>P</sup> is the maximum correlation between IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup><sup>y</sup> and HP <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup>g, <sup>b</sup><sup>P</sup> is the first eigenvector of matrix KPP<sup>1</sup> <sup>C</sup>, <sup>v</sup><sup>P</sup> <sup>¼</sup> <sup>λ</sup><sup>1</sup> <sup>P</sup> M<sup>0</sup> P<sup>1</sup> M <sup>1</sup> M0 P<sup>1</sup> CbP, M<sup>0</sup> ¼ <sup>D</sup><sup>0</sup> Ψ0 , and <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> C. In a similar manner to the RESIM context, we can assume that λ<sup>P</sup> and λ<sup>1</sup> <sup>P</sup> are proportionality constants and it can be shown that the covariance between IP <sup>¼</sup> <sup>b</sup><sup>0</sup> Py and HP <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup><sup>g</sup> (σHPIP ) is equal to the variance of IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup>y (σ<sup>2</sup> IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>p</sup>PbP), that is, <sup>σ</sup>HPIP <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup>Cb<sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>p</sup>PbP.

The accuracy of the PPG-ESIM can also be written as

$$\rho\_{H\_{P}I\_{P}} = \frac{\sqrt{\mathbf{b}\_{P}^{\prime}\mathbf{P}\mathbf{b}\_{P}}}{\sqrt{\mathbf{w}\_{P}^{\prime}\mathbf{C}\mathbf{w}\_{P}}},\tag{7.46}$$

where σ<sup>2</sup> HP <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>P</sup>Cw<sup>P</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> PPC<sup>1</sup> Pb<sup>P</sup> <sup>þ</sup> <sup>b</sup><sup>0</sup> PPC<sup>1</sup> Q0 <sup>P</sup>Cb<sup>P</sup> is the variance of HP. When <sup>D</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> , w<sup>0</sup> <sup>P</sup>Cw<sup>P</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>R</sup>Cw<sup>R</sup> (the variance of HR), and when U<sup>0</sup> is a null matrix, w<sup>0</sup> P Cw<sup>P</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>E</sup>Cw<sup>E</sup> (the variance of HE). Hereafter, to simplify the notation, we write Eq. (7.46) as ρ<sup>P</sup> or λP.

Let <sup>β</sup><sup>P</sup> <sup>¼</sup> Fb<sup>P</sup> be the PPG-ESIM transformed vector of coefficients by matrix <sup>F</sup>. By Eqs. (7.1) and (7.46), the maximized selection response (RP) and expected genetic gain per trait (EP) of the PPG-ESIM can be written as

$$R\_P = k\_I \sqrt{\mathfrak{P}\_P' \mathbf{P} \mathfrak{P}\_P} \tag{7.47}$$

and

$$\mathbf{E}\_P = k\_I \frac{\mathbf{C} \mathfrak{B}\_P}{\sqrt{\mathfrak{B}\_P' \mathbf{P} \mathfrak{B}\_P}},\tag{7.48}$$

respectively, where ffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>P</sup>Pβ<sup>P</sup> q <sup>¼</sup> <sup>σ</sup>IP is the standard deviation of the variance of IP <sup>¼</sup> <sup>β</sup><sup>0</sup> <sup>P</sup>y. Equations (7.47) and (7.48) do not require economic weights. When F is an identity matrix, <sup>β</sup><sup>P</sup> <sup>¼</sup> <sup>b</sup>P, IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup>y, RP <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffi b0 <sup>P</sup>Pb<sup>P</sup> q , and <sup>E</sup><sup>P</sup> <sup>¼</sup> kI Cb<sup>P</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>p</sup> .

b0 <sup>P</sup>Pb<sup>P</sup> Equation (7.47) can also be written as RP <sup>¼</sup> kIσHP <sup>λ</sup>P, where <sup>σ</sup>HP <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 PPC<sup>1</sup> Pb<sup>P</sup> <sup>þ</sup> <sup>b</sup><sup>0</sup> PPC<sup>1</sup> Q0 <sup>P</sup>Cb<sup>P</sup> q is the standard deviation of the variance of HP, and <sup>λ</sup><sup>P</sup> is the canonical correlation between HP and IP <sup>¼</sup> <sup>β</sup><sup>0</sup> <sup>P</sup>y. When <sup>σ</sup>HP <sup>¼</sup> 1, Eq. (7.47) can be written as RP <sup>¼</sup> kIλP, where <sup>λ</sup><sup>P</sup> is the covariance between IP <sup>¼</sup> <sup>b</sup><sup>0</sup> P <sup>y</sup> and <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>P</sup>g.

The prediction efficiency of the PPG-ESIM can be obtained in a similar manner to the ESIM and RESIM. The accuracy of the PPG-ESIM (Eq. 7.46) can be used to construct the ratio of index accuracies. The PPG-ESIM mean square error or the VPE can be obtained as

$$E\left[\left(H\_P - I\_P\right)^2\right] = \sigma\_{H\_P}^2 + \sigma\_{I\_P}^2 - 2\sigma\_{H\_P I\_P} = \sigma\_{H\_P}^2 - \sigma\_{I\_P}^2 = \left(1 - \rho\_P^2\right)\sigma\_{H\_P}^2. \tag{7.49}$$

Additional properties associated with the ESIM are also valid for the PPG-ESIM.

#### 7.3.2 Estimating PPG-ESIM Parameters

The procedure used to estimate PPG-ESIM parameters is the same as that described for RESIM. Let Cb and Pb be the estimated matrices of C and P. In the PPG-ESIM context, we use matrix <sup>S</sup><sup>b</sup> ¼ <sup>K</sup><sup>b</sup> <sup>P</sup>Pb<sup>1</sup> Cb to obtain the estimated eigenvalues and eigenvectors of equation

$$(\widehat{\mathbf{S}} - \widehat{\lambda}\_{Pj}^2 \mathbf{I}\_t)\widehat{\mathbf{b}}\_{Pj} = \mathbf{0},\tag{7.50}$$

<sup>j</sup> ¼ 1, 2, ---, t, where t is the number of traits in the PPG-ESIM index, <sup>K</sup><sup>b</sup> <sup>P</sup> <sup>¼</sup> <sup>I</sup><sup>t</sup> <sup>Q</sup><sup>b</sup> <sup>P</sup> , <sup>I</sup><sup>t</sup> is an identity matrix of size <sup>t</sup> <sup>t</sup> and <sup>Q</sup><sup>b</sup> <sup>P</sup> <sup>¼</sup> <sup>P</sup>b<sup>1</sup> Ψb D D0 Ψ c0 <sup>P</sup>b<sup>1</sup> Ψb D 1 D0 Ψ c0 . As Sb is an asymmetric matrix, the values of <sup>b</sup>bPj and <sup>b</sup>λ<sup>2</sup> Pj should be obtained using SVD (singular value decomposition).

According to SVD, we need to solve equation

$$\left(\widehat{\mathbf{S}}\widehat{\mathbf{S}}'-\widehat{\boldsymbol{\mu}}\_{Pj}\mathbf{I}\_{l}\right)\widehat{\mathbf{b}}\_{Pj}=\mathbf{0},\tag{7.51}$$

where <sup>μ</sup>bPj <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> <sup>P</sup> <sup>j</sup> ( <sup>j</sup><sup>¼</sup> 1, 2, ..., <sup>t</sup>). By Eq. (7.51), the estimated PPG-ESIM index (IP <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>P</sup>y) is <sup>b</sup><sup>I</sup> <sup>P</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>P</sup><sup>1</sup> y. The estimator of the maximized PPG-ESIM selection response, and its expected genetic gain per trait, can be denoted as <sup>R</sup>b<sup>P</sup> <sup>¼</sup> kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>1Pbbb<sup>P</sup><sup>1</sup> q and <sup>E</sup>b<sup>P</sup> <sup>¼</sup> kI <sup>C</sup>bbb<sup>P</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>1Pbbb<sup>P</sup><sup>1</sup> <sup>q</sup> respectively, whereas the estimator of

the maximized accuracy of the PPG-ESIM is bλ<sup>P</sup><sup>1</sup> .

#### 7.3.3 Numerical Examples

We compare the results of the PPG-LPSI and the PPG-ESIM using the Akbar et al. (1984) data described earlier. We restrict traits RL and SM, on both indices using the PPG vector <sup>d</sup><sup>0</sup> ¼ ½ <sup>3</sup> <sup>1</sup> . In Chap. 3, Sect. 3.1.4, we indicated how to construct matrix U<sup>0</sup> and, in Sect. 3.2.4 of the same chapter, we described how to obtain matrix Kb <sup>P</sup> for one and two restrictions. Matrix Kb <sup>P</sup> is the same for the PPG-LPSI and the PPG-ESIM. Thus, we omit the steps for constructing matrices U<sup>0</sup> and Kb <sup>P</sup>.

Assume a selection intensity of 10% (kI <sup>¼</sup> 1.755) and that the vector of economic weights is <sup>w</sup><sup>0</sup> ¼ ½ <sup>19</sup>:<sup>54</sup> 3:56 17:<sup>01</sup> . The estimated PPG-LPSI vector of coefficients for two predetermined restrictions was <sup>b</sup>b<sup>0</sup> <sup>¼</sup> ½ <sup>1</sup>:70 1:04 2:<sup>93</sup> , and its estimated selection response, expected genetic gain per trait, accuracy, and heritability were <sup>R</sup><sup>b</sup> ¼ <sup>1</sup>:<sup>755</sup> ffiffiffiffiffiffiffiffiffiffi bb0 Pbbb q <sup>¼</sup> <sup>49</sup>:02, <sup>E</sup>b<sup>0</sup> ¼ <sup>1</sup>:<sup>755</sup> <sup>b</sup>b<sup>0</sup> Cb ffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>bb<sup>b</sup> <sup>p</sup> <sup>¼</sup> ½ <sup>1</sup>:<sup>25</sup> 0:42 1:<sup>36</sup> , bρ ¼ ffiffiffiffiffiffiffiffiffiffi bb0 <sup>P</sup>bb<sup>b</sup> <sup>p</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Cw<sup>b</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:24, and <sup>b</sup>h<sup>2</sup> <sup>¼</sup> bb0 Cbbb bb0 <sup>P</sup>bb<sup>b</sup> <sup>¼</sup> <sup>0</sup>:12 respectively. In this case, bb0 <sup>b</sup><sup>b</sup> ¼ <sup>12</sup>:57; then, the estimated PPG-LPSI selection response using the normalized PPG-LPSI vector of coefficients was <sup>R</sup><sup>b</sup> ¼ <sup>49</sup>:<sup>02</sup> <sup>12</sup>:<sup>57</sup> <sup>¼</sup> <sup>3</sup>:90, whereas the rest of the estimated PPG-LPSI parameters were the same.

In the PPG-ESIM, we need matrix <sup>S</sup><sup>b</sup> ¼ <sup>K</sup><sup>b</sup> <sup>P</sup>Pb<sup>1</sup> Cb to obtain the eigenvalues and eigenvectors of <sup>S</sup>bSb<sup>0</sup> <sup>μ</sup>bPjI<sup>t</sup> <sup>b</sup>b<sup>P</sup> <sup>j</sup> <sup>¼</sup> <sup>0</sup> that make up matrices <sup>L</sup><sup>1</sup>=<sup>2</sup> <sup>P</sup> , V<sup>P</sup><sup>1</sup> , and <sup>S</sup><sup>b</sup> ¼ <sup>V</sup><sup>P</sup>1L<sup>1</sup>=<sup>2</sup> <sup>P</sup> V<sup>0</sup> P2 , where <sup>μ</sup>bPj <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> P j . It can be shown that <sup>S</sup>b¼ <sup>K</sup><sup>b</sup> <sup>P</sup>Pb<sup>1</sup>C<sup>b</sup> ¼ <sup>0</sup>:<sup>1047</sup> 0:<sup>0349</sup> 0:<sup>0279</sup> <sup>0</sup>:<sup>0678</sup> 0:<sup>0226</sup> 0:<sup>0213</sup> 0:1970 0:0657 0:<sup>4119</sup> 2 4 3 5, Sb bS<sup>0</sup> ¼ <sup>0</sup>:0130 0:<sup>0085</sup> 0:<sup>0344</sup> <sup>0</sup>:0085 0:<sup>0056</sup> 0:<sup>0236</sup> 0:<sup>0344</sup> 0:0236 0:<sup>2118</sup> 2 4 3 5, 0:1663 0:8292 0:<sup>5336</sup> 2 3

and <sup>V</sup><sup>P</sup><sup>1</sup> <sup>¼</sup> 0:1138 0:<sup>5214</sup> 0:<sup>8457</sup> <sup>0</sup>:9795 0:<sup>2014</sup> 0:<sup>0076</sup> 4 <sup>5</sup>, whereas the <sup>μ</sup>bPj¼bλ<sup>4</sup> P j values were 0.2214,

0.0099, and 0.0, whence L<sup>1</sup>=<sup>2</sup> <sup>P</sup> ¼ 0:4705 0 0 0 0:0997 0 0 00:0 2 4 3 <sup>5</sup>. Thus, <sup>μ</sup>b<sup>P</sup><sup>1</sup> <sup>¼</sup>bλ<sup>4</sup> <sup>P</sup><sup>1</sup> <sup>¼</sup>0:2214,

bλ2 <sup>P</sup><sup>1</sup> <sup>¼</sup>0:4705, and the estimated maximized PPG-ESIM accuracy was <sup>b</sup>λ<sup>P</sup><sup>1</sup> <sup>¼</sup>0:6859.

We transformed the first eigenvector <sup>b</sup>b<sup>0</sup> <sup>p</sup><sup>1</sup> ¼ ½ <sup>0</sup>:<sup>1663</sup> 0:1138 0:<sup>9795</sup> using <sup>900</sup> 2 3

matrix <sup>F</sup> ¼ 0 10 0 01 4 <sup>5</sup> to obtain vector <sup>β</sup>b<sup>P</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> p1 <sup>F</sup> ¼ ½ <sup>1</sup>:<sup>4968</sup> 0:1138 0:<sup>9795</sup>

and βb<sup>0</sup> <sup>P</sup>βb<sup>P</sup> <sup>¼</sup> <sup>3</sup>:21, whence the estimates of the index, the selection response, and expected genetic gain per trait of the PPG-ESIM were <sup>b</sup><sup>I</sup> <sup>P</sup> <sup>¼</sup> <sup>1</sup>:4968RL0:1138SM<sup>þ</sup>

$$0.9795\text{EW}, \quad \widehat{R}\_P = \frac{1.755\sqrt{\widehat{\mathfrak{P}}\_P^\prime \widehat{\mathbf{P}} \widehat{\mathfrak{P}}\_P}}{\widehat{\mathfrak{P}}\_P^\prime \widehat{\mathfrak{P}}\_P} = \frac{43.01}{3.21} = 13.39, \quad \text{and} \quad \widehat{\mathbf{E}}\_P^\prime = 1.755\frac{\widehat{\mathfrak{P}}\_P^\prime \widehat{\mathbf{C}}}{\sqrt{\widehat{\mathfrak{P}}\_P^\prime \widehat{\mathbf{P}} \widehat{\mathfrak{P}}\_P}} = 1.15$$

½ <sup>3</sup>:<sup>05</sup> 1:96 0:<sup>19</sup> respectively. The estimated PPG-LPSI selection response was <sup>R</sup><sup>b</sup> ¼ <sup>49</sup>:<sup>02</sup> <sup>12</sup>:<sup>57</sup> <sup>¼</sup> <sup>3</sup>:90, which means that the estimated PPG-ESIM selection response was greater than the estimated PPG-LPSI response.

We compared PPG-ESIM efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of the estimated PPG-ESIM accuracy (bλ<sup>P</sup><sup>1</sup> <sup>¼</sup> <sup>0</sup>:6859) to PPG-LPSI accuracy (b<sup>ρ</sup> ¼ <sup>0</sup>:24), i.e., <sup>b</sup>λ<sup>P</sup><sup>1</sup> <sup>b</sup><sup>ρ</sup> <sup>¼</sup> <sup>0</sup>:<sup>6859</sup> <sup>0</sup>:<sup>24</sup> <sup>¼</sup> <sup>2</sup>:858 or, in percentage terms, <sup>b</sup>pP <sup>¼</sup> 100 2ð Þ¼ :<sup>858</sup> <sup>1</sup> <sup>185</sup>:80. Then, the PPG-ESIM was a better predictor of the net genetic merit and its efficiency was 185.80% higher than that of the PPG-LPSI for this data set.

Now, we compare PPG-ESIM efficiency versus PPG-LPSI efficiency using the data set described in Sect. 2.8.1 of Chap. 2 for five phenotypic selection cycles, each with four traits (T1, T2, T3, and T4), 500 genotypes, and four replicates for each genotype. The economic weights for <sup>T</sup>1, <sup>T</sup>2, <sup>T</sup>3, and <sup>T</sup><sup>4</sup> were 1, 1, 1, and 1 respectively. For this data set, matrix <sup>F</sup> was an identity matrix of size 4 4 for all five selection cycles.

The first and second parts of columns 6, 7, and 8 in Table 7.1 present the estimated PPG-LPSI and PPG-ESIM selection responses for one, two, and three predetermined restrictions for five simulated selection cycles. The selection intensity was 10% (kI <sup>¼</sup> 1.755) and the vectors of PPG for each predetermined restriction were d0 <sup>1</sup> <sup>¼</sup> 7, <sup>d</sup><sup>0</sup> <sup>2</sup> <sup>¼</sup> ½ <sup>7</sup> <sup>3</sup> , and <sup>d</sup><sup>0</sup> <sup>3</sup> <sup>¼</sup> ½ <sup>7</sup> 3 5 respectively, for all five selection cycles. The estimated PPG-LPSI selection response when the vector of coefficients was not normalized was presented in Chap. 3 (Table 3.5). The averages of the estimated PPG-LPSI selection response for each predetermined restriction were 4.70, 4.91, and 3.14, whereas the averages of the estimated PPG-ESIM selection response were 6.31, 6.28, and 6.75 respectively. These results indicate that the estimated PPG-ESIM selection response was greater than the estimated PPG-LPSI selection response for all predetermined restrictions.

The second part of Table 7.2 presents the estimated PPG-ESIM accuracy (bρP) and the ratio of <sup>b</sup>ρ<sup>P</sup> to the estimated PPG-LPSI accuracy (b<sup>ρ</sup> ), expressed in percentage terms, <sup>b</sup>pP <sup>¼</sup> <sup>100</sup> <sup>b</sup>λ<sup>P</sup> <sup>1</sup> , where <sup>b</sup>λ<sup>P</sup> <sup>¼</sup> <sup>b</sup>ρP=bρ, for one, two, and three predetermined restrictions for five simulated selection cycles. The estimated PPG-LPSI accuracies were presented in Chap. 3 (Table 3.6). The average estimated PPG-ESIM efficiency for each restriction was 9.76%, 11.71%, and 29.03% greater than the PPG-LPSI efficiency for this data set in all five selection cycles.

The second part of Table 7.3 presents the estimated PPG-ESIM expected genetic gain per trait for one, two, and three predetermined restrictions for five simulated selection cycles. The estimated PPG-LPSI expected genetic gains per trait for one, two, and three predetermined restrictions were presented in Chap. 3, Table 3.5, where it can be seen that the averages of the estimated PPG-LPSI expected genetic gains per trait for five simulated selection cycles were 6.85, 3.25, 2.62 and 1.48 for one restriction; 6.93, 2.97, 2.65 and 1.45 for two restrictions; and 5.20, 2.23, 3.72 and 1.43 for three restrictions, whereas for the same set of restrictions, the averages of the estimated PPG-ESIM expected genetic gain per trait were 5.67, 2.67, 1.81, and 2.97 for one restriction; 5.89, 2.52, 2.04, and 2.83 for two restrictions; and 5.71, 2.45, 4.08, and 0.82 for three restrictions (Table 7.3). Because the vectors of predetermined proportional gains for each predetermined restriction were d<sup>0</sup> <sup>1</sup> <sup>¼</sup> 7, <sup>d</sup><sup>0</sup> <sup>2</sup> <sup>¼</sup> ½ <sup>7</sup> <sup>3</sup> , and <sup>d</sup><sup>0</sup> <sup>3</sup> <sup>¼</sup> ½ <sup>7</sup> 3 5 , the averages of the estimated PPG-LPSI expected genetic gains per trait were closer than those of the estimated PPG-ESIM expected genetic gains per trait for one and two predetermined restrictions, whereas for three restrictions, the results of both selection indices were similar.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 8 Linear Molecular and Genomic Eigen Selection Index Methods

Abstract The three main linear phenotypic eigen selection index methods are the eigen selection index method (ESIM), the restricted ESIM (RESIM) and the predetermined proportional gain ESIM (PPG-ESIM). The ESIM is an unrestricted index, but the RESIM and PPG-ESIM allow null and predetermined restrictions respectively to be imposed on the expected genetic gains of some traits, whereas the rest remain without any restrictions. These indices are based on the canonical correlation, on the singular value decomposition, and on the linear phenotypic selection indices theory. We extended the ESIM theory to the molecular-assisted and genomic selection context to develop a molecular ESIM (MESIM), a genomic ESIM (GESIM), and a genome-wide ESIM (GW-ESIM). Also, we extend the RESIM and PPG-ESIM theory to the restricted genomic ESIM (RGESIM), and to the predetermined proportional gain genomic ESIM (PPG-GESIM) respectively. The latter five indices use marker and phenotypic information jointly to predict the net genetic merit of the candidates for selection, but although MESIM uses only statistically significant markers linked to quantitative trait loci, the GW-ESIM uses all genome markers and phenotypic information and the GESIM, RGESIM, and PPG-GESIM use the genomic estimated breeding values and the phenotypic values to predict the net genetic merit. Using real and simulated data, we validated the theoretical results of all five indices.

#### 8.1 The Molecular Eigen Selection Index Method

The molecular eigen selection index method (MESIM) is very similar to the linear molecular selection index (LMSI) described in Chap. 4; thus, it uses the same set of information to predict the net genetic merit of individual candidates for selection, and therefore needs the same set of conditions as those of the LMSI. The only difference between the two indices is how the vector of coefficients is obtained and the assumption associated with the vector of economic weights. Thus, although the LMSI obtains the vector of coefficients according to the linear phenotypic selection index (LPSI) described in Chap. 2 and assumes that the economic weights are known and fixed, the MESIM assumes that the economic weights are unknown and fixed and obtains the vector of coefficients according to the ESIM theory.

### 8.1.1 The MESIM Parameters

In the MESIM context, the net genetic merit can be written as

$$H = \mathbf{w}\_1' \mathbf{g} + \mathbf{w}\_2' \mathbf{s} = \begin{bmatrix} \mathbf{w}\_1' & \mathbf{w}\_2' \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{s} \end{bmatrix} = \mathbf{w}' \mathbf{a},\tag{8.1}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gt ½ is the vector of true breeding values, t is the number of traits, w<sup>0</sup> <sup>1</sup> ¼ w<sup>1</sup> wt ½ is a vector of unknown economic weights associated with g, w<sup>0</sup> <sup>2</sup> ¼ 01 0<sup>t</sup> ½ is a null vector associated with the vector of marker score values s<sup>0</sup> ¼ s<sup>1</sup> s<sup>2</sup> ... st ½ -, w<sup>0</sup> ¼ w<sup>0</sup> <sup>1</sup> w<sup>0</sup> <sup>2</sup> ½ and a<sup>0</sup> ¼ g<sup>0</sup> s<sup>0</sup> ½ - (Chap. 4 for details). The MESIM index can be written as

$$I = \mathfrak{P}'\_{\mathbf{y}} \mathbf{y} + \mathfrak{P}'\_{\mathbf{s}} \mathbf{s} = \begin{bmatrix} \mathfrak{P}'\_{\mathbf{y}} & \mathfrak{P}'\_{\mathbf{s}} \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{s} \end{bmatrix} = \mathfrak{P}' \mathbf{t}, \tag{8.2}$$

where y<sup>0</sup> ¼ y<sup>1</sup> yt ½ is the vector of phenotypic values;s<sup>0</sup> ¼ s<sup>1</sup> s<sup>2</sup> ... st ½ - is the vector of marker scores; β<sup>0</sup> <sup>y</sup> and β<sup>s</sup> are vectors of phenotypic and marker score weight values respectively, β<sup>0</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> G and t <sup>0</sup> ¼ y<sup>0</sup> s<sup>0</sup> ½ -. The objectives of the MESIM are the same as those of the ESIM (see Chap. 7 for details).

Let Var Hð Þ¼ w<sup>0</sup> <sup>Ψ</sup>M<sup>w</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>H</sup> be the variance of H, Var Ið Þ¼ β<sup>0</sup> <sup>T</sup>M<sup>β</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>I</sup> the variance of I, and Cov(H, I) ¼ w<sup>0</sup> ΨMβ the covariance between H and I, where <sup>Ψ</sup><sup>M</sup> <sup>¼</sup> Var <sup>g</sup> s <sup>¼</sup> C S<sup>M</sup> S<sup>M</sup> S<sup>M</sup> and <sup>T</sup><sup>M</sup> <sup>¼</sup> Var <sup>y</sup> s <sup>¼</sup> P S<sup>M</sup> S<sup>M</sup> S<sup>M</sup> are block matrices of size 2t 2t (t is the number of traits) of covariance matrices where P, SM, and C are covariance matrices t t of phenotypic (y), marker score (s), and genetic breeding (g) values respectively. Let <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> ΨM β ffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Ψ</sup><sup>M</sup> <sup>w</sup> <sup>p</sup> ffiffiffiffiffiffiffiffiffi β0 T<sup>M</sup> β p and h<sup>2</sup> <sup>I</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> ΨMβ β0 <sup>T</sup>M<sup>β</sup> be the correlation between H and I, and the heritability of I respectively; then, the MESIM selection response can be written as

$$R = k\_I \sigma\_H \rho\_{HI} \tag{8.3}$$

and

$$R = k\_I \sigma\_I h\_I^2,\tag{8.4}$$

where kI is the standardized selection differential (or selection intensity) associated with MESIM; <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Ψ</sup>M<sup>w</sup> <sup>p</sup> and <sup>σ</sup><sup>I</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi β0 TMβ p are the standard deviations of the variance of H and I respectively. It is assumed that kI is fixed, and that matrices T<sup>M</sup> and Ψ<sup>M</sup> are known; therefore, we can maximize R by maximizing ρHI (Eq. 8.3) with respect to vectors w and β, or by maximizing h<sup>2</sup> <sup>I</sup> (Eq. 8.4) only with respect to vector β.

Maximizing h<sup>2</sup> <sup>I</sup> only with respect to β is simpler than maximizing ρHI with respect to w and β; however, in the latter case the maximization process of ρHI gives more information associated with MESIM parameters than when h<sup>2</sup> <sup>I</sup> is maximized only with respect to β (see Chap. 7, Eq. 7.13, for details). In this subsection, we maximize ρHI with respect to vectors w and β similar to the ESIM in Chap. 7, Sect. 7.1.1. Thus, we omit the steps and details of the maximization process of ρHI.

We maximize <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> ΨM β ffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Ψ</sup><sup>M</sup> <sup>w</sup> <sup>p</sup> ffiffiffiffiffiffiffiffiffi β0 T<sup>M</sup> β p with respect to vectors w and β under the restrictions σ<sup>2</sup> <sup>H</sup> ¼ w<sup>0</sup> Ψw, σ<sup>2</sup> <sup>I</sup> ¼ β<sup>0</sup> Tβ, and 0 < σ<sup>2</sup> <sup>H</sup>, σ<sup>2</sup> <sup>I</sup> <sup>&</sup>lt; <sup>1</sup>, where <sup>σ</sup><sup>2</sup> <sup>H</sup> is the variance of H ¼ w<sup>0</sup> a and σ<sup>2</sup> <sup>I</sup> is the variance of I ¼ β<sup>0</sup> t. Thus, it is necessary to maximize the function

$$f(\mathfrak{f}, \mathbf{w}, \boldsymbol{\mu}, \boldsymbol{\phi}) = \mathbf{w}' \mathbf{P} \mathfrak{f} - 0.5 \boldsymbol{\mu} \left( \mathfrak{f}' \mathbf{T} \mathfrak{f} - \sigma\_I^2 \right) - 0.5 \boldsymbol{\phi} \left( \mathbf{w}' \Psi \mathbf{w} - \sigma\_H^2 \right) \tag{8.5}$$

with respect to β, w, μ, and ϕ, where μ and ϕ are Lagrange multipliers. The derivatives of Eq. (8.5) with respect to β, w, μ, and ϕ are:

$$
\underbrace{\Psi \mathbf{w} - \mu \mathbf{T} \mathbf{f}}\_{\ldots} \mathbf{f} = \mathbf{0},\tag{8.6}
$$

$$
\underset{\cdot}{\Psi} \mathbf{\hat{p}} - \phi \mathbf{\Psi} \mathbf{\hat{w}} = \mathbf{0}, \tag{8.7}
$$

$$\mathbf{f}'\mathbf{T}\mathbf{f} = \sigma\_I^2 \quad \text{and} \quad \mathbf{w}'\Psi\mathbf{w} = \sigma\_H^2,\tag{8.8}$$

respectively, where Eq. (8.8) denotes the restrictions imposed for maximizing ρHI. It can be shown (see Chap. 7) that vector w can be obtained as

$$\mathbf{w}\_M = \Psi\_M^{-1} \mathbf{T}\_M \mathfrak{h} \tag{8.9}$$

and the net genetic merit in the MESIM context can be written as HM ¼ w<sup>0</sup> <sup>M</sup>a; thus, the correlation betweenHM ¼ w<sup>0</sup> <sup>M</sup>aand I isρHM <sup>I</sup> ¼ ffiffiffiffiffiffiffi β0 Tβ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 TΨ1Tβ p and the MESIM vector of coefficients (β) that maximizes <sup>ρ</sup>HM <sup>I</sup> can be obtained from equation

$$\left(\mathbf{T}^{-1}\Psi - \lambda\_M^2 \mathbf{I}\_{2t}\right)\mathfrak{f}\_M = \mathbf{0},\tag{8.10}$$

where <sup>I</sup>2<sup>t</sup> is an identity matrix of size 2<sup>t</sup> <sup>2</sup><sup>t</sup> (<sup>t</sup> is the number of traits), and <sup>λ</sup><sup>2</sup> <sup>M</sup> and β<sup>M</sup> are the eigenvalue and eigenvector of matrix T<sup>1</sup> <sup>M</sup> ΨM. The words eigenvalue and eigenvector are derived from the German word eigen, which means owned by or peculiar to. Eigenvalues and eigenvectors are sometimes called characteristic values and characteristic vectors, proper values and proper vectors, or latent values and latent vectors (Meyer 2000). The square root of λ<sup>2</sup> <sup>M</sup> (λM) is the canonical correlation between HM ¼ w<sup>0</sup> <sup>M</sup>a and IM ¼ β<sup>0</sup> <sup>M</sup>t, and the optimized MESIM index can be written asIM ¼ β<sup>0</sup> <sup>M</sup>t. Using a similar procedure to that described in Chap. 7 (Eq. 7.17), it can be show that vector β<sup>M</sup> can be transformed into β<sup>C</sup> ¼ FβM, where F is a diagonal matrix with values equal to any real number, except zero values.

The maximized correlation between HM ¼ w<sup>0</sup> <sup>M</sup>a and IM ¼ β<sup>0</sup> <sup>M</sup>t, or MESIM accuracy, is

$$\rho\_{H\_M I\_M} = \frac{\sqrt{\mathfrak{P}\_M' \mathbf{T}\_M \mathfrak{P}\_M}}{\sqrt{\mathfrak{P}\_M' \mathbf{T}\_M \mathfrak{P}\_M'^{-1} \mathbf{T}\_M \mathfrak{P}\_M}} = \frac{\sigma\_{I\_M}}{\sigma\_{H\_M}},\tag{8.11}$$

where σIM ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>M</sup>TMβ<sup>M</sup> q is the standard deviation of IM ¼ β<sup>0</sup> <sup>M</sup>t, and <sup>σ</sup>HM <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 MTMΨ<sup>1</sup> <sup>M</sup> TMβ<sup>M</sup> q is the standard deviation of HM ¼ w<sup>0</sup> <sup>M</sup>a.

The maximized selection response and expected genetic gain per trait of MESIM are

$$R\_M = k\_I \sqrt{\mathfrak{B}\_{M\_1}' \mathbf{T}\_M \mathfrak{B}\_{M\_1}} \tag{8.12}$$

and

$$\mathbf{E}\_M = k\_I \frac{\mathbf{\varPsi}\_M \mathbf{\varp}\_{M\_1}}{\sqrt{\mathfrak{P}\_{M\_1}' \mathbf{T}\_M \mathfrak{P}\_{M\_1}}},\tag{8.13}$$

respectively, where <sup>β</sup><sup>M</sup><sup>1</sup> is the first eigenvector of matrix <sup>T</sup><sup>1</sup> <sup>M</sup> ΨM. If vector β<sup>M</sup><sup>1</sup> is multiplied by matrix F, we obtain β<sup>C</sup><sup>1</sup> ¼ Fβ<sup>M</sup><sup>1</sup> ; in this case, we can replace β<sup>M</sup><sup>1</sup> with β<sup>C</sup><sup>1</sup> ¼ Fβ<sup>M</sup><sup>1</sup> in Eqs. (8.12) and (8.13), and the optimized MESIM index should be written as IM ¼ β<sup>0</sup> C1 y.

### 8.1.2 Estimating MESIM Parameters

We estimate the MESIM parameters using the same procedure described in Chap. 7 (Sect. 7.1.4) to estimate the ESIM parameters. Let Cb, Pb, and Sb<sup>M</sup> be the estimates of the genotypic, phenotypic, and marker scores covariance matrices, <sup>T</sup>b<sup>M</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>S</sup>b<sup>M</sup> Sb<sup>M</sup> Sb<sup>M</sup> and <sup>Ψ</sup><sup>b</sup> <sup>M</sup> <sup>¼</sup> <sup>C</sup><sup>b</sup> <sup>S</sup>b<sup>M</sup> Sb<sup>M</sup> Sb<sup>M</sup> the estimated block matrices (Chap. 4) and <sup>W</sup><sup>c</sup> <sup>¼</sup> <sup>T</sup>b<sup>1</sup> <sup>M</sup> <sup>Ψ</sup><sup>b</sup> <sup>M</sup>; then, to find the estimators <sup>β</sup>b<sup>M</sup><sup>1</sup> and <sup>b</sup>λ<sup>2</sup> <sup>M</sup><sup>1</sup> of the first eigenvector (β<sup>M</sup><sup>1</sup> ) and the first eigenvalue (λ<sup>2</sup> M<sup>1</sup> ) respectively, we need to solve the equation

$$(\widehat{\mathbf{W}}\widehat{\mathbf{W}}' - \widehat{\mu}\_j \mathbf{I})\widehat{\mathfrak{k}}\_{M\_j} = \mathbf{0},\tag{8.14}$$

where <sup>μ</sup>b<sup>j</sup> <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> M <sup>j</sup> , j¼ 1, 2, ..., 2t. For additional details, see Eqs. (7.22) and (7.23), and Sect. 7.1.5 of Chap. 7. The result of Equation (8.14) allow the MESIM index (IM ¼ β<sup>0</sup> M<sup>1</sup> t) to be estimated as bI <sup>M</sup> ¼ bβ<sup>0</sup> <sup>M</sup><sup>1</sup> t, whereas the estimator of the maximized ESIM selection response and its expected genetic gain per trait can be denoted by

$$
\widehat{\boldsymbol{R}}\_{M} = k\_{I} \sqrt{\widehat{\boldsymbol{\mathfrak{f}}}\_{M\_{1}}^{\prime} \widehat{\mathbf{T}}\_{M} \widehat{\boldsymbol{\mathfrak{f}}}\_{M\_{1}}} \text{ and } \widehat{\mathbf{E}}\_{M} = k\_{I} \frac{\widehat{\boldsymbol{\Psi}}\_{M} \widehat{\boldsymbol{\mathfrak{f}}}\_{M\_{1}}}{\sqrt{\widehat{\boldsymbol{\mathfrak{f}}}\_{M\_{1}}^{\prime} \widehat{\mathbf{T}}\_{M} \widehat{\boldsymbol{\mathfrak{f}}}\_{M\_{1}}}}, \tag{8.15}
$$

respectively.

### 8.1.3 Numerical Examples

To validate the MESIM theoretical results, we use a real maize (Zea mays) F2 population with 247 genotypes (each with two repetitions), 195 molecular markers, and two traits—plant height (PHT, cm) and ear height (EHT, cm)—evaluated in one environment. We coded the marker homozygous loci for the allele from the first parental line by 1, whereas the marker homozygous loci for the allele from the second parental line was coded by 1 and the marker heterozygous loci by 0. The estimated phenotypic, genetic, and marker scores covariance matrices were <sup>P</sup><sup>b</sup> <sup>¼</sup> <sup>191</sup>:81 106:<sup>89</sup> <sup>106</sup>:89 167:<sup>93</sup> , <sup>C</sup><sup>b</sup> <sup>¼</sup> <sup>83</sup>:00 57:<sup>44</sup> <sup>57</sup>:44 59:<sup>80</sup> , and <sup>S</sup>b<sup>M</sup> <sup>¼</sup> <sup>15</sup>:750 0:<sup>983</sup> <sup>0</sup>:983 28:<sup>083</sup> respectively, and the vector of economic weights was a<sup>0</sup> ¼ w<sup>0</sup> 0<sup>0</sup> ½ -, where w<sup>0</sup> ¼ ½ - 1 1 and 0<sup>0</sup> ¼ ½ - 0 0 . Details of how to estimate the marker scores and their variance were given in Chap. 4.

We compare LMSI versus MESIM efficiency. The estimated LMSI vector of coefficients was <sup>β</sup>b<sup>0</sup> <sup>¼</sup> <sup>a</sup><sup>0</sup> <sup>Ψ</sup><sup>b</sup> <sup>M</sup>Tb<sup>1</sup> <sup>M</sup> ¼ ½ - 0:59 0:18 0:41 0:82 . Using a 10% selection intensity (kI ¼ 1.755), the estimated LMSI selection response and the expected genetic gain per trait were Rb ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 TbMβb q ¼ 20:41 and Eb0 ¼ kI bβ0 <sup>Ψ</sup><sup>b</sup> <sup>M</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 TbMβb q ¼ ½ -10:09 10:31 2:53 4:39 respectively, whereas the esti-

mated LMSI accuracy was <sup>b</sup>ρ<sup>H</sup>b<sup>I</sup> ¼ σbI σbH ¼ 0:72.

Vector βb<sup>0</sup> <sup>M</sup><sup>1</sup> ¼ ½ - 0:089 0:061 0:536 0:837 was the original estimated 0:10 0 0 2 3

MESIM vector of coefficients. Using matrix <sup>F</sup> <sup>¼</sup> 0 0:10 0 0 00:75 0 0 00 0:75 6 6 4 7 7 5,

vector βb<sup>0</sup> <sup>M</sup><sup>1</sup> was transformed as βb<sup>0</sup> <sup>C</sup><sup>1</sup> ¼ βb<sup>0</sup> M<sup>1</sup> F ¼ ½ - 0:009 0:006 0:402 0:628 and then the estimated MESIM index was bI <sup>M</sup> ¼ 0:009 PHT þ 0:006EHT 0:402 SPHT þ 0:628 SEHT, where SPHT and SEHT denote the marker scores associated with PHT and EHT respectively. The estimated MESIM expected genetic gain, selection response, and accuracy were Eb<sup>0</sup> <sup>M</sup> ¼ kI β b0 <sup>C</sup><sup>1</sup> <sup>Ψ</sup>b<sup>M</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β b0 C1 b <sup>T</sup>MbβC<sup>1</sup> q ¼

$$\begin{aligned} \widehat{\rho} \begin{bmatrix} -3.438 & -8.516 & -3.319 & -8.372 \end{bmatrix}, \quad \widehat{R}\_M &= k\_I \sqrt{\widehat{\mathfrak{P}}\_{C\_1} \widehat{\mathfrak{T}}\_M \widehat{\mathfrak{P}}\_{C\_1}} = 6.573 \quad \text{and} \\ \widehat{\rho} \underset{H\_M \text{I}\_M}{\longrightarrow} &= \frac{\widehat{\sigma}\_{I\_M}}{\widehat{\sigma}\_{H\_M}} = 0.99 \text{ respectively.} \end{aligned}$$
 
$$\text{The inner product of the estimated I MSL and MESIM vector of coordinates must}$$

The inner product of the estimated LMSI and MESIM vector of coefficients were 1.221 and 0.556 respectively, whence the estimated LMSI selection response (20.41) divided by 1.221 was 16.716, and the estimated MESIM selection response (6.573) divided by 0.556 was 11.821. That is, the estimated LMSI selection response was higher than the estimated MESIM selection response for this data set. Similar results were found when we compared the estimated LMSI expected genetic gain per trait with the estimated MESIM expected genetic gain per trait. Finally, Fig. 8.1 presents the frequency distribution of the 247 estimated MESIM values for the real data set described earlier, which approaches normal distribution, as we would expect.

Now with a selection intensity of 10% (kI ¼ 1.755), we compare the LMSI and MESIM efficiency using the simulated data set described in Sect. 2.8.1 of Chap. 2 for four phenotypic selection cycles, each with four traits (T1, T2, T<sup>3</sup> and T4), 500 genotypes, and four replicates of each genotype. The economic weights for T1, T2, T3, and T<sup>4</sup> were 1, 1, 1, and 1 respectively. For this data set, we did not use the linear transformation βb<sup>C</sup><sup>1</sup> ¼ Fβb<sup>M</sup><sup>1</sup> .

The estimated selection responses of the linear marker, combined genomic and genome-wide selection indices (LMSI, CLGSI, and GW-LMSI respectively; see

Fig. 8.1 Frequency distribution of 247 estimated molecular eigen selection index method (MESIM) values for one selection cycle in an environment for a real maize (Zea mays) F2 population with 195 molecular markers and two traits, plant height (PHT, cm) and ear height (EHT, cm), and their associated marker scores SPHT and SEHT respectively

Chaps. 4 and 5 for details) for four simulated selection cycles when their vectors of coefficients were normalized, are presented in Table 8.1. Also, in this table the selection responses of the estimated linear molecular, genomic, and genome-wide eigen selection index methods (MESIM, GESIM, and GW-ESIM respectively; details in Sect. 8.2) are shown for four simulated selection cycles. The average of the estimated LMSI selection response was 2.22, whereas the average of the estimated MESIM selection response was 1.69. The estimated LMSI selection response was higher than that of the MESIM.

Table 8.2 presents the estimated LMSI and MESIM expected genetic gains for four traits (T1, T2, T3, and T4) and their associated marker scores (S1, S2, S3, and S4) for four simulated selection cycles. The averages of the estimated LMSI

Table 8.1 Estimated linear molecular, combined genomic, and genome-wide selection index (LMSI, CLGSI and GW-LMSI respectively) selection responses when their vectors of coefficients are normalized for four simulated selection cycles


Estimated linear molecular, genomic, and genome-wide eigen selection index method (MESIM, GESIM, and GW-ESIM respectively) selection responses for four simulated selection cycles. The selection intensity was 10% (kI ¼ 1.755)

Table 8.2 Estimated linear molecular selection index (LMSI) and estimated linear molecular eigen selection index method (MESIM) expected genetic gains for four traits (T1, T2, T3, and T4) and their associated marker scores (S1, S2, S3, and S4) for four simulated selection cycles. The selection intensity was 10% (kI ¼ 1.755)


expected genetic gains for the four traits and their associated marker scores were 12.74, 2.10, 1.60, 0.94, 5.70, 2.19, 0.71, and 0.64 respectively, whereas the averages of the estimated MESIM expected genetic gains for the four traits and their associated marker scores were 14.40, 0.38, 0.39, 0.34, 8.65, 0.47, 0.21, and 0.70 respectively. Except for trait T1 and its associated molecular scores, the estimated LMSI expected genetic gains per trait were higher than the estimated MESIM expected genetic gains. Thus, for this data set, LMSI efficiency was greater than MESIM efficiency.

Chapter 11 presents RIndSel, a user-friendly graphical unit interface in JAVA that is useful for estimating the LMSI and ESIM parameters and selecting parents for the next selection cycle.

#### 8.2 The Linear Genomic Eigen Selection Index Method

The linear genomic eigen selection index method (GESIM) is based on the standard CLGSI described in Chap. 5, and uses genomic estimated breeding values (GEBVs) and phenotypic values jointly to predict the net genetic merit. Thus, conditions for constructing a valid GESIM are the same as those for constructing the CLGSI. Also, the MESIM theory described in Sect. 8.1 is directly applied to the GESIM and only minor changes are necessary in GESIM theory. For example, instead of marker scores, the GESIM uses GEBVs to predict the net genetic merit; thus, the details of the estimation process are the same as for the MESIM.

### 8.2.1 The GESIM Parameters

In the GESIM context, the net genetic merit can be written as

$$H = \mathbf{w}\_1' \mathbf{g} + \mathbf{w}\_2' \mathbf{y} = \begin{bmatrix} \mathbf{w}\_1' & \mathbf{w}\_2' \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{y} \end{bmatrix} = \mathbf{w}' \mathbf{a},\tag{8.16}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gt ½ is the vector of true breeding values, t is the number of traits, w<sup>0</sup> <sup>1</sup> ¼ w<sup>1</sup> wt ½ is a vector of unknown economic weights associated with g, w<sup>0</sup> <sup>2</sup> ¼ 01 0<sup>t</sup> ½ is a null vector associated with the vector of genomic breeding values γ<sup>0</sup> ¼ γ<sup>1</sup> γ<sup>2</sup> ... γ<sup>t</sup> ½ -, w<sup>0</sup> ¼ w<sup>0</sup> <sup>1</sup> w<sup>0</sup> <sup>2</sup> ½ -, and α<sup>0</sup> ¼ g<sup>0</sup> γ<sup>0</sup> ½ -. The estimator of γ is the GEBV (see Chap. 5 for additional details). The GESIM index can be written as

$$I = \mathfrak{P}'\_{\mathbf{y}} \mathbf{y} + \mathfrak{P}'\_{\mathbf{y}} \mathbf{y} = \begin{bmatrix} \mathfrak{P}'\_{\mathbf{y}} & \mathfrak{P}'\_{\mathbf{y}} \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{y} \end{bmatrix} = \mathfrak{P}' \mathbf{f}, \tag{8.17}$$

where y<sup>0</sup> ¼ y<sup>1</sup> yt ½ is the vector of phenotypic values; β<sup>0</sup> <sup>y</sup> and βγ are vectors of weights of phenotypic and genomic breeding values weights respectively; β<sup>0</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> γ and f <sup>0</sup> ¼ y<sup>0</sup> γ<sup>0</sup> ½ -.

Let Var Hð Þ¼ w<sup>0</sup> Aw <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>H</sup> be the variance of H ¼ w<sup>0</sup> α, Var Ið Þ¼ β<sup>0</sup> Φβ <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>I</sup> the variance of I ¼ β<sup>0</sup> f, and Cov(H, I) ¼ w<sup>0</sup> Aβ ¼ σHI the covariance between H and I, where <sup>A</sup> <sup>¼</sup> Var <sup>g</sup> γ <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ and <sup>Φ</sup> <sup>¼</sup> Var <sup>y</sup> γ <sup>¼</sup> <sup>P</sup> <sup>Γ</sup> Γ Γ are block matrices 2t 2t (t is the number of traits) of covariance matrices and P, Γ, and C are covariance matrices of phenotypic (y), genomic (γ), and genetic (g) values respectively. Then, <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> Aβ ffiffiffiffiffiffiffiffi w0 Aw <sup>p</sup> ffiffiffiffiffiffiffiffi β0 Φβ <sup>p</sup> is the correlation between <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> α and I ¼ β<sup>0</sup> f and the GESIM selection response can be written as

$$R = k\_I \sigma\_H \rho\_{H\text{I}},\tag{8.18}$$

where kI is the standardized selection differential (or selection intensity) associated with the GESIM and <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Aw <sup>p</sup> is the standard deviation of the variance of <sup>H</sup>. It is assumed that kI is fixed, and that matrices Φ and A are known; then, we can maximize R by maximizing ρHI with respect to vectors w and β under the restrictions σ2 <sup>H</sup> ¼ w<sup>0</sup> Aw, σ<sup>2</sup> <sup>I</sup> ¼ β<sup>0</sup> Φβ, and 0 < σ<sup>2</sup> <sup>H</sup>, σ<sup>2</sup> <sup>I</sup> < 1; similar to the MESIM.

It can be shown that the vector w in the GESIM context is

$$\mathbf{w}\_G = \mathbf{A}^{-1} \boldsymbol{\Phi} \mathbf{f} \tag{8.19}$$

and that the net genetic merit can be written as HG ¼ w<sup>0</sup> <sup>G</sup>α. The correlation between HG ¼ w<sup>0</sup> <sup>G</sup>α and I ¼ β<sup>0</sup> f is ρHGI ¼ ffiffiffiffiffiffiffiffi β0 Φβ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>Φ</sup>A1Φβ <sup>p</sup> and the GESIM index vector of coefficients that maximizes <sup>ρ</sup>HGI can be obtained from the equation

$$(\mathbf{\Phi}^{-1}\mathbf{A} - \lambda\_G^2 \mathbf{I}\_{2t})\mathbf{\tilde{p}}\_G = \mathbf{0},\tag{8.20}$$

where I2<sup>t</sup> is an identity matrix of size 2t 2t (t is the number of traits); the optimized GESIM index can be written as IG ¼ β<sup>0</sup> <sup>G</sup>f. By Eqs. (8.19) and (8.20), GESIM accuracy can be written as

$$
\rho\_{H\_{\mathcal{G}}I\_{\mathcal{G}}} = \frac{\sigma\_{I\_{\mathcal{G}}}}{\sigma\_{H\_{\mathcal{G}}}},\tag{8.21}
$$

where σIG ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>G</sup>Φβ<sup>G</sup> q is the standard deviation of IG ¼ β<sup>0</sup> <sup>G</sup>f, and <sup>σ</sup>HG <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 GΦA<sup>1</sup> Φβ<sup>G</sup> q is the standard deviation of HG ¼ w<sup>0</sup> <sup>G</sup>α. In Eq. (8.20), λ<sup>2</sup> <sup>G</sup> ¼ ρ2 HGIG is the square of the canonical correlation between HG and IG, and β<sup>G</sup> is the canonical vector associated with λ<sup>2</sup> <sup>G</sup> <sup>¼</sup> <sup>ρ</sup><sup>2</sup> HGIG .

The maximized GESIM selection response and expected genetic gain per trait are

$$R\_G = k\_I \sqrt{\mathfrak{P}\_G' \mathfrak{Ap} \mathfrak{f}\_G} \tag{8.22}$$

and

$$\mathbf{E}\_G = k\_I \frac{\mathbf{A} \mathfrak{P}\_G}{\sqrt{\mathfrak{P}\_G' \mathbf{a} \mathfrak{P}\_G}},\tag{8.23}$$

respectively, where β<sup>G</sup> is the first eigenvector of matrix Φ<sup>1</sup> A. Vector β<sup>G</sup> can be transformed as <sup>β</sup>CG <sup>¼</sup> <sup>F</sup>βG, where <sup>F</sup> is a diagonal matrix defined earlier.

### 8.2.2 Numerical Examples

To compare the CLGSI versus GESIM theoretical results, we use a real maize (Zea mays) F2 population with 244 genotypes (each with two repetitions), 233 molecular markers, and three traits—grain yield (GY, ton ha<sup>1</sup> ), ear height (EHT, cm), and plant height (PHT, cm). We estimated matrices P and C using Eqs. (2.22) to (2.24) described in Chap. 2, whence the estimated matrices were Pb ¼ 0:45 1:33 2:33 1:33 65:07 83:71 2:33 83:71 165:99 2 4 3 5 and Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 2 4 3 5. In a similar manner, we estimated matrix Γ by applying Eqs. (5.21) to (5.23) described in Chap. 5 using phenotypic and marker information jointly; the estimated matrix was Γb ¼ 0:07 0:65 1:05 0:65 10:62 14:25 1:05 14:25 26:37 2 4 3 5. The selection intensity for making a selection

cycle was 10% (kI ¼ 1.755) and the vector of economic weights was w<sup>0</sup> ¼ ½ - <sup>5</sup> 0:<sup>1</sup> 0:<sup>1000</sup> . To obtain the estimated vector of coefficient of CLGSI (β<sup>b</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> <sup>1</sup> Awb ) and GESIM (Eq. 8.20), it is necessary to construct matrices <sup>A</sup><sup>b</sup> <sup>¼</sup> <sup>C</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb and <sup>Φ</sup><sup>b</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb .

The estimated CLGSI vector of coefficients for the traits GY, EHT, and PHT and their associated GEBVs (GEBVGY, GEBVEHT, and GEBVPHT respectively) was βb<sup>0</sup> ¼ ½ - 0:08 0:02 0:01 4:92 0:08 0:09 , whereas the estimated CLGSI selection response, accuracy, and expected genetic gain per trait were Rb ¼ kI ffiffiffiffiffiffiffiffiffiffiffi βb0 Φb βb q <sup>¼</sup> <sup>1</sup>:54, <sup>b</sup>ρHI <sup>¼</sup> <sup>σ</sup>b<sup>I</sup> σbH ¼ 0:814, and Eb<sup>0</sup> ¼ kI βb0 Ab ffiffiffiffiffiffiffiffiffiffiffi βb0 Φb βb q ¼

½ - 0:36 1:04 1:70 0:36 1:53 2:38 respectively. Finally,bI ¼ 0:08GY 0:02 EHT 0:01PHT þ 4:92GEBVGY 0:08GEBVEHT 0:09GEBVPHT was the estimated CLGSI.

The estimated GESIM vector of coefficients, selection response, accuracy, and expected genetic gain per trait were βb<sup>0</sup> G<sup>1</sup> ¼ ½ - 0:207 0:029 0:041 0:820 0:337 0:411 , Rb<sup>G</sup> ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 G1 Φb βbG<sup>1</sup> q ¼ 6:288, <sup>b</sup>ρHbGb<sup>I</sup> <sup>G</sup> ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 G1 Φb βbG<sup>1</sup> q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 G1 <sup>Φ</sup><sup>b</sup> <sup>A</sup><sup>b</sup> <sup>1</sup> Φb βbG<sup>1</sup> <sup>q</sup> <sup>¼</sup> <sup>0</sup>:9056, and <sup>E</sup>b<sup>0</sup> <sup>G</sup> ¼ k<sup>1</sup> βb0 G1 Ab ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 G1 Φb βbG<sup>1</sup> q ¼

½ - 0:369 5:528 9:186 0:370 5:250 8:702 respectively. Fig. 8.2 presents the frequency distribution of the 244 estimated GESIM index values for one (Fig. 8.2a) and three traits (Fig. 8.2b) using the real data set described earlier. The frequency distribution of the estimated GESIM index values approaches

the normal distribution for both indices. Now, we compare the estimated CLGSI and GESIM selection response and expected genetic gain per trait using the simulated data set described in Sect. 2.8.1 of Chap. 2 for four phenotypic selection cycles, each with four traits (T1, T2, T<sup>3</sup> and T4), 500 genotypes, and four replicates per genotype. The economic weights of T1, T2, T3, and T<sup>4</sup> were 1, 1, 1, and 1 respectively and the selection intensity for both

Fig. 8.2 Frequency distribution of the 244 estimated genomic eigen selection index method (GESIM) values for the one-trait case (a) and for the three-trait case (b) for one selection cycle in an environment for a real maize (Zea mays) F2 population with 233 molecular markers. Note that the frequency distribution of the estimated GESIM index values approaches normal distribution for both indices

Table 8.3 Estimated combined linear genomic selection index (CLGSI) and estimated GESIM expected genetic gains for four traits (T1, T2, T3, and T4) and their associated genomic estimated breeding values (GEBV1, GEBV2, GEBV3, and GEBV4) for four simulated selection cycles. The selection intensity was 10% (kI ¼ 1.755)


indices was 10% (kI ¼ 1.755). For this data set, matrix F was an identity matrix of size 8 8 in all four selection cycles.

For this data set, the averages of the estimated CLGSI and GESIM selection responses were 0.68 and 2.74 (Table 8.1) respectively. The estimated CLGSI selection response was lower than the estimated GESIM selection response. Table 8.3 presents the estimated CLGSI and GESIM expected genetic gain for four traits (T1, T2, T3, and T4) and their associated genomic estimated breeding values (GEBV1, GEBV2, GEBV3, and GEBV4) for four simulated selection cycles. The averages of the estimated CLGSI expected genetic gains for the four traits and their associated GEBVs were 7.45, 3.35, 2.68, 1.09, 7.13, 3.68, 3.13, and 2.69 respectively, whereas the averages of the estimated GESIM expected genetic gains for the four traits and their associated GEBVs were 8.18, 3.08, 2.27, 0.71, 7.46, 3.53, 2.86, and 2.39 respectively. The estimated CLGSI and GESIM expected genetic gains per trait were very similar.

#### 8.3 The Genome-Wide Linear Eigen Selection Index Method

The MESIM requires regressing phenotypic values on marker coded values to predict the marker score values for each individual candidate for selection, and then combining the marker scores with phenotypic information using the MESIM to obtain a final prediction of the net genetic merit. In addition, the GESIM requires fitting of a statistical model to estimate all available marker effects in the training population; these estimates are then used to obtain GEBVs, which are predictors of breeding values. Crossa and Cerón-Rojas (2011) extended the ESIM theory to a genome-wide linear molecular ESIM (GW-ESIM) similar to the GW-LMSI described in Chap. 4. The GW-LMSI and GW-ESIM are very similar and only minor changes are necessary in GW-ESIM; for example, instead of estimating the GW-LMSI vector of coefficients according to the LPSI method (Chap. 2), the GW-ESIM vector of coefficients is estimated according to the singular value decomposition (SVD) described in Chap. 7.

### 8.3.1 The GW-ESIM Parameters

In the GW-ESIM context, the net genetic merit can be written as

$$H = \mathbf{w}\_1' \mathbf{g} + \mathbf{w}\_2' \mathbf{m} = \begin{bmatrix} \mathbf{w}\_1' & \mathbf{w}\_2' \end{bmatrix} \begin{bmatrix} \mathbf{g} \\ \mathbf{m} \end{bmatrix} = \mathbf{w}' \mathbf{x},\tag{8.24}$$

where g<sup>0</sup> ¼ g<sup>1</sup> ... gt ½ is the vector of true breeding values, t is the number of traits, w<sup>0</sup> <sup>1</sup> ¼ w<sup>1</sup> wt ½ is the vector of unknown economic weights associated with the breeding values; w<sup>0</sup> <sup>2</sup> ¼ ½ - 01 0<sup>N</sup> is a null vector associated with the vector of marker code values m<sup>0</sup> ¼ ½ m<sup>1</sup> mN , where mj ( j ¼ 1, 2, ..., N ¼ number of markers) is the jth marker in the training population; w<sup>0</sup> ¼ w<sup>0</sup> <sup>1</sup> w<sup>0</sup> <sup>2</sup> ½ and x ¼ g<sup>0</sup> m<sup>0</sup> ½ -. The GW-ESIM (I) index combines the phenotypic value and all the marker information of individuals to predict Eq. (8.24) values in each selection cycle and can be written as

$$I = \mathfrak{P}'\_{\mathfrak{y}} \mathbf{y} + \mathfrak{P}'\_{m} \mathbf{m} = \begin{bmatrix} \mathfrak{P}'\_{\mathfrak{y}} & \mathfrak{P}'\_{m} \end{bmatrix} \begin{bmatrix} \mathbf{y} \\ \mathbf{m} \end{bmatrix} = \mathfrak{P}' \mathbf{q},\tag{8.25}$$

where β<sup>0</sup> <sup>y</sup> and β<sup>m</sup> are vectors of phenotypic and marker weights respectively; y<sup>0</sup> ¼ y<sup>1</sup> yt ½ is the vector of phenotypic values; m was defined in Eq. (8.24); β<sup>0</sup> ¼ β<sup>0</sup> <sup>y</sup> β<sup>0</sup> m and <sup>q</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>0</sup> <sup>m</sup><sup>0</sup> ½ -.

Let σ<sup>2</sup> <sup>I</sup> ¼ β<sup>0</sup> Qβ and σ<sup>2</sup> <sup>H</sup> ¼ w<sup>0</sup> Zw be the variance of I ¼ β<sup>0</sup> q and H ¼ w<sup>0</sup> z respectively, and σHI ¼ w<sup>0</sup> <sup>Z</sup><sup>β</sup> the covariance between <sup>I</sup> and <sup>H</sup>, where <sup>Q</sup> <sup>¼</sup> Var <sup>y</sup> m ¼ P G<sup>0</sup> M G<sup>M</sup> M and <sup>X</sup> <sup>¼</sup> Var <sup>g</sup> m <sup>¼</sup> C G<sup>0</sup> M G<sup>M</sup> M are block matrices of size (t + N) (t + N) (t is the number of traits and N is the number of markers) where P ¼ Var(y), M ¼ Var(m), C ¼ Var(g), and G<sup>M</sup> ¼ cov (y, m) ¼ cov (g, m) are covariance matrices of phenotypic (y), coded marker (m), and genetic (g) values respectively, whereas G<sup>M</sup> is the covariance matrix between y and m, and between g and m (for details see Chap. 4); w and β were defined earlier. Note that although the size of matrices P and C are t t, the sizes of matrices M and G<sup>M</sup> are N N and N t respectively. Thus, if the number of markers is very high, the size of matrices M and G<sup>M</sup> could also be very high.

In Chap. 4 we described matrix M as

$$\mathbf{M} = \begin{bmatrix} 1 & (1 - 2\theta\_{11}) & \dots & (1 - 2\theta\_{1N}) \\ (1 - 2\theta\_{21}) & 1 & \dots & (1 - 2\theta\_{2N}) \\ \vdots & \vdots & \ddots & \vdots \\ (1 - 2\theta\_{N1}) & (1 - 2\theta\_{N2}) & \dots & 1 \end{bmatrix},\tag{8.26}$$

where (1 2θij) and θij (i, j¼ 1, 2, ..., N¼ number of markers) are the covariance (or correlation) and the recombination frequency between the ith and jth marker respectively, whereas matrix G<sup>M</sup> can be written as

$$\mathbf{G}\_{M} = \begin{bmatrix} (1 - 2r\_{11})a\_{11} & (1 - 2r\_{11})a\_{12} & \dots & (1 - 2r\_{1N})a\_{1N\_{\mathcal{Q}}} \\ (1 - 2r\_{21})a\_{21} & (1 - 2r\_{22})a\_{22} & \dots & (1 - 2r\_{2N})a\_{2N\_{\mathcal{Q}}} \\ \vdots & \vdots & \ddots & \vdots \\ (1 - 2r\_{t1})a\_{t1} & (1 - 2r\_{N2})a\_{t2} & \dots & (1 - 2r\_{NN})a\_{tN\_{\mathcal{Q}}} \end{bmatrix},\tag{8.27}$$

where (1 2rik)αqk (i¼ 1, 2, ..., N, k¼ 1, 2, ..., NQ ¼ number of quantitative trait loci (QTL), q ¼ 1, 2, ..., t) is the covariance between the qth trait and the ith marker; rik is the recombination frequency between the ith and kth QTL, and αqk is the effect of the kth QTL over the qth trait.

Let <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> Xβ ffiffiffiffiffiffiffiffi w0 Xw <sup>p</sup> ffiffiffiffiffiffiffi β0 Qβ p be the correlation between I ¼ β<sup>0</sup> q and H ¼ w<sup>0</sup> x; then, the

GW-ESIM selection response can be written as

$$R = k\_I \sigma\_H \rho\_{Hl},\tag{8.28}$$

where kI is the standardized selection differential (or selection intensity) associated with GW-ESIM and <sup>σ</sup><sup>H</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffi w0 Xw <sup>p</sup> is the standard deviation of the variance of <sup>H</sup>.

Assuming that kI is fixed, and that matrices Q and X are known, we can maximize R (Eq. 8.28) by maximizing ρHI with respect to vectors w<sup>0</sup> and β under the restrictions σ<sup>2</sup> <sup>H</sup> ¼ w<sup>0</sup> Xw, σ<sup>2</sup> <sup>I</sup> ¼ β<sup>0</sup> Qβ, and 0 < σ<sup>2</sup> H,σ<sup>2</sup> <sup>I</sup> < 1, similar to the MESIM and GESIM. It can be shown that vector w can be written as

$$\mathbf{w}\_{W} = \mathbf{X}^{-1} \mathbf{Q} \mathfrak{P} \tag{8.29}$$

and that HW ¼ w<sup>0</sup> <sup>W</sup> x is the net genetic merit in the GW-ESIM context. The correlation between HW ¼ w<sup>0</sup> <sup>W</sup> x and I ¼ β<sup>0</sup> q is ρHW <sup>I</sup> ¼ ffiffiffiffiffiffiffi β0 Qβ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 QX1Qβ p and the GW-ESIM vector of coefficients (β) that maximizes <sup>ρ</sup>HW <sup>I</sup> can be obtained from equation

$$(\mathbf{Q}^{-1}\mathbf{Z} - \lambda\_W^2 \mathbf{I}\_{(l+N)})\mathbf{\hat{p}}\_W = \mathbf{0},\tag{8.30}$$

where I(<sup>t</sup> <sup>+</sup> <sup>N</sup>) is an identity matrix of size (t + N) (t + N) and IW ¼ β<sup>0</sup> <sup>W</sup> q is the optimized GW-ESIM. The accuracy of the GW-ESIM can be written as

$$\rho\_{H \le I\_W} = \frac{\sqrt{\mathfrak{F}\_W^{\prime} \mathbf{Q} \mathfrak{F}\_W}}{\sqrt{\mathfrak{F}\_W^{\prime} \mathbf{Q} \mathbf{X}^{-1} \mathbf{Q} \mathfrak{F}\_W}} = \frac{\sigma\_{I\_W}}{\sigma\_{H\_W}},\tag{8.31}$$

where σIW ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>W</sup> Qβ<sup>W</sup> q is the standard deviation of IW ¼ β<sup>0</sup> <sup>W</sup> <sup>q</sup>, and <sup>σ</sup>HW <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>W</sup> QX<sup>1</sup> Qβ<sup>W</sup> q is the standard deviation of HW ¼ w<sup>0</sup> <sup>W</sup> x. In Eq. (8.30) λ<sup>2</sup> <sup>W</sup> ¼ ρ2 HW IW is the square of the canonical correlation between HW and IW.

The maximized GW-ESIM selection response and expected genetic gain per trait are

$$R\_W = k\_I \sqrt{\mathfrak{P}\_W^{\prime} \mathbf{Q} \mathfrak{P}\_W} \tag{8.32}$$

and

$$\mathbf{E}\_W = k\_1 \frac{\mathbf{X} \mathfrak{B}\_W}{\sqrt{\mathfrak{B}\_W^{\prime} \mathbf{Q} \mathfrak{B}\_W}},\tag{8.33}$$

respectively, where β<sup>W</sup> is the first eigenvector of Eq. (8.30).

### 8.3.2 Estimating GW-ESIM Parameters

In Chap. 2, Eqs. (2.22) to (2.24), we described the restricted maximum likelihood methods to estimate matrices C and P, which can be denoted by Cb and Pb. In Chap. 4, we described how to estimate matrices M and GM, which can be denoted byMb andGb <sup>M</sup>. With these estimates, we constructed the block estimated matrices as

$$
\widehat{\mathbf{Q}} = \begin{bmatrix} \widehat{\mathbf{P}} & \widehat{\mathbf{G}}\_{M}^{\prime} \\ \widehat{\mathbf{G}}\_{M} & \widehat{\mathbf{M}} \end{bmatrix} \text{ and } \widehat{\mathbf{X}} = \begin{bmatrix} \widehat{\mathbf{C}} & \widehat{\mathbf{G}}\_{M}^{\prime} \\ \widehat{\mathbf{G}}\_{M} & \widehat{\mathbf{M}} \end{bmatrix}, \text{ whence we obtained the equation}
$$

$$
(\widehat{\mathbf{Q}}^{-} \widehat{\mathbf{X}} - \widehat{\lambda}\_{W}^{2} \mathbf{I}) \widehat{\mathbf{f}}\_{Wj} = \mathbf{0}, \tag{8.34}
$$

j ¼ 1, 2, ..., (t + N), where (t + N) is the number of traits and markers in the GW-ESIM index. Similar to the MESIM, we obtained estimators <sup>β</sup>b<sup>W</sup><sup>1</sup> and <sup>b</sup>λ<sup>2</sup> W<sup>1</sup> of the first eigenvector <sup>β</sup><sup>W</sup><sup>1</sup> and the first eigenvalue <sup>b</sup>λ<sup>2</sup> <sup>W</sup><sup>1</sup> respectively, from equation

$$(\hat{\mathbf{E}}\hat{\mathbf{E}}'-\hat{\boldsymbol{\mu}}\_{j}\mathbf{I})\hat{\mathbf{\upbeta}}\_{W\_{j}}=\mathbf{0},\tag{8.35}$$

where <sup>E</sup><sup>b</sup> <sup>¼</sup> <sup>Q</sup><sup>b</sup> X<sup>b</sup> and <sup>μ</sup>b<sup>j</sup> <sup>¼</sup> <sup>b</sup>λ<sup>4</sup> W <sup>j</sup> . These results allow the GW-ESIM index selection response and its expected genetic gain per trait to be estimated as bI <sup>W</sup> ¼ bβ<sup>0</sup> <sup>W</sup>1qb, Rb<sup>W</sup> ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>W</sup>1Qbβ<sup>0</sup> W<sup>1</sup> q and Eb<sup>w</sup> ¼ kI Xb bβ<sup>0</sup> W<sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>W</sup>1Qbβ<sup>0</sup> W<sup>1</sup> <sup>q</sup> respectively, whereas the estimator of GW-ESIM accuracy is bλW<sup>1</sup> .

## 8.3.3 Numerical Examples

We compare the estimated GW-LMSI and GW-ESIM selection responses using the simulated data set described in Sect. 2.8.1 of Chap. 2, with a selection intensity of 10% (kI ¼ 1.755). Table 8.1 presents the estimated GW-LMSI selection response for four simulated selection cycles when their vectors of coefficients are normalized, whence it can be seen that the average estimated GW-LMSI selection response was 0.87. Table 8.1 also presents the estimated GW-ESIM selection response for four simulated selection cycles; the average of the estimated GW-ESIM selection responses was 0.93. Thus, for this data set, the estimated GW-LMSI and selection responses were very similar.

#### 8.4 The Restricted Linear Genomic Eigen Selection Index Method

The restricted linear genomic eigen selection index method (RGESIM) is based on the restricted linear phenotypic ESIM (RESIM) theory described in Chap. 7. In the RESIM, the breeder's objective is to improve only (t r) of t (r < t) traits, leaving r of them fixed. The same is true for RGESIM, but in this case, we should impose 2r restrictions, i.e., we need to fix r traits and their associated r GEBV to obtain results similar to those obtained with the RESIM (see Chap. 7 for details). This is the main difference between the RGESIM and the RESIM.

It can be shown that Cov(I, α) ¼ Aβ is the covariance between the breeding value vector (α<sup>0</sup> ¼ [g<sup>0</sup> γ<sup>0</sup> ]) and the GESIM index (I ¼ β<sup>0</sup> f). In the RGESIM, we want some covariances between the linear combinations of α (U<sup>0</sup> <sup>G</sup>α) and I ¼ β<sup>0</sup> f to be zero, i.e., Cov IG; U<sup>0</sup> <sup>G</sup><sup>α</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> <sup>G</sup>Aβ ¼ 0, where U<sup>0</sup> <sup>G</sup> is a matrix 2(t 1) 2t of 1s and 0s (1 indicates that the trait and its associated GEBV are restricted, and 0 indicates that the trait and its GEBV have no restrictions). We can solve this problem by maximizing <sup>β</sup><sup>0</sup> Aβ ffiffiffiffiffiffiffiffi β0 Φβ <sup>p</sup> with respect to vector <sup>β</sup> under the restriction <sup>U</sup><sup>0</sup> <sup>G</sup>Aβ ¼ 0 and β<sup>0</sup> β ¼ 1 similar to the RESIM, or by maximizing the correlation between H ¼ w<sup>0</sup> α and I ¼ β<sup>0</sup> <sup>f</sup>, <sup>ρ</sup>HI <sup>¼</sup> <sup>w</sup><sup>0</sup> Aβ ffiffiffiffiffiffiffiffi w0 Aw <sup>p</sup> ffiffiffiffiffiffiffiffi β0 Φβ <sup>p</sup> , with respect to vectors <sup>w</sup><sup>0</sup> and <sup>β</sup> under the restrictions U0 <sup>G</sup>A<sup>β</sup> <sup>¼</sup> <sup>0</sup>, <sup>σ</sup><sup>2</sup> <sup>H</sup> ¼ w<sup>0</sup> Aw, σ<sup>2</sup> <sup>I</sup> ¼ β<sup>0</sup> Φβ and 0 < σ<sup>2</sup> <sup>H</sup>, σ<sup>2</sup> <sup>I</sup> < 1, as we did for the GESIM.

### 8.4.1 The RGESIM Parameters

To obtain the RGESIM vector of coefficients, we maximize the function

$$f(\mathfrak{B}, \mathbf{v}') = \frac{\mathfrak{B}' \mathbf{A} \mathfrak{B}}{\sqrt{\mathfrak{B}' \mathbf{O} \mathfrak{B}}} - \mathbf{v}' \mathbf{U}'\_G \mathbf{A} \mathfrak{B} \tag{8.36}$$

with respect to β and v<sup>0</sup> , where v<sup>0</sup> ¼ [v<sup>1</sup> v<sup>2</sup> v2(<sup>r</sup> 1)] is a vector of Lagrange multipliers. The derivatives of function f(β, v<sup>0</sup> ) with respect to β and v<sup>0</sup> can be written as

$$2\left(\mathfrak{P}'\Phi\mathfrak{P}\right)^{1/2}\mathbf{A}\mathfrak{P}-\left(\mathfrak{P}'\Phi\mathfrak{P}\right)^{-1/2}\left(\mathfrak{P}'\mathbf{A}\mathfrak{P}\right)\mathfrak{P}\mathfrak{P}-\mathbf{A}\mathbf{U}\_G\mathbf{v}=\mathbf{0},\tag{8.37}$$

$$\mathbf{'}\_G \mathbf{A} \mathfrak{P} = \mathbf{0},\tag{8.38}$$

respectively, where Eq. (8.38) denotes the restriction imposed for maximizing Eq. (8.36). Using algebraic methods on Eq. (8.37), we get

U0

$$(\mathbf{K}\_{RG}\boldsymbol{\Phi}^{-1}\mathbf{A} - \boldsymbol{\lambda}\_{RG}^{2}\mathbf{I}\_{2t})\mathbf{\tilde{p}}\_{RG} = \mathbf{0},\tag{8.39}$$

where λ<sup>2</sup> RG <sup>¼</sup> <sup>h</sup><sup>2</sup> IRG , <sup>h</sup><sup>2</sup> IRG is the RGESIM heritability obtained under the restriction U<sup>0</sup> G Aβ ¼ 0 ; KRG ¼ [I2<sup>t</sup> QRG], I2<sup>t</sup> is an identity matrix of size 2t 2t, and <sup>Q</sup>RG <sup>¼</sup> <sup>Φ</sup><sup>1</sup> AU<sup>G</sup> U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> U0 <sup>G</sup>A. When U<sup>0</sup> <sup>G</sup> is a null matrix, β<sup>0</sup> RG ¼ β<sup>0</sup> G (the vector of the GESIM coefficients); thus, the RGESIM is more general than the GESIM and includes the GESIM as a particular case. The RGESIM index IGR ¼ β0 RG<sup>y</sup> and its selection response and expected genetic gain per trait use the first eigenvector of matrix KGΦ<sup>1</sup> A. It can be shown that the vector of coefficients of H ¼ w<sup>0</sup> RGα in the RGESIM can be written as

$$\mathbf{w}\_{RG} = \mathbf{A}^{-1} \left[ \mathbf{\upPhi} + \mathbf{Q}'\_{RG} \mathbf{A} \right] \mathbf{\upPhi}\_{RG}, \tag{8.40}$$

where Q<sup>0</sup> RG ¼ AU<sup>G</sup> U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> U0 GAΦ<sup>1</sup> .

Note that the restriction U<sup>0</sup> <sup>G</sup>Aβ ¼ 0 can be written as β<sup>0</sup> AU<sup>G</sup> ¼ 0; this means that β0 Q0 RG ¼ 0 and that the covariance between HRG ¼ w<sup>0</sup> RGα and IRG ¼ β<sup>0</sup> RGf (σHRGIRG) can be written as

$$
\sigma\_{R\alpha J\alpha\varepsilon} = \mathbf{w}\_{RG}^{\prime} \mathbf{A} \mathfrak{P}\_{RG}^{\prime} = \mathfrak{P}\_{RG}^{\prime} \mathbf{\varPhi} \mathfrak{P}\_{RG} + \mathfrak{P}\_{RG}^{\prime} \mathbf{Q}\_{RG}^{\prime} \mathbf{C} \mathfrak{P}\_{RG} = \mathfrak{P}\_{RG}^{\prime} \mathbf{\varPhi} \mathfrak{P}\_{RG}.\tag{8.41}
$$

Equation (8.41) indicates that σHRGIRG is equal to the variance of IRG ¼ β<sup>0</sup> RGf (σ<sup>2</sup> IRG ¼ β<sup>0</sup> RGΦβRG); therefore, the maximized correlation between IRG and HRG or RGESIM accuracy can be written as

$$
\rho\_{H\_{RG}I\_{RG}} = \frac{\sqrt{\mathfrak{P}\_{RG}^{\prime} \Phi \mathfrak{P}\_{RG}}}{\sqrt{\mathbf{w}\_{RG}^{\prime} \mathbf{A} \mathbf{w}\_{RG}}},\tag{8.42}
$$

where w<sup>0</sup> RGAwRG is the variance of HRG. Hereafter, to simplify the notation, we write Eq. (8.42) as λRG.

The maximized selection response and the expected genetic gain per trait of the RGESIM are

$$R\_{RG} = k\_I \sqrt{\mathfrak{B}\_{RG}^{\prime} \mathfrak{Ap} \mathfrak{f}\_{RG}} \tag{8.43}$$

and

$$\mathbf{E}\_{RG} = k\_I \frac{\mathbf{A} \mathfrak{f}\_{RG}}{\sqrt{\mathfrak{f}\_{RG}^{\prime} \mathbf{a} \mathfrak{f} \mathfrak{f}\_{RG}}},\tag{8.44}$$

respectively, where βRG is the first eigenvector of matrix KRGΦ<sup>1</sup> A.

### 8.4.2 Estimating RGESIM Parameters

In Sect. 8.2, we indicated how to estimate matrices P, Γ, and C using phenotypic and genomic information, whence we can estimate matrices <sup>A</sup> <sup>¼</sup> <sup>C</sup> <sup>Γ</sup> Γ Γ and <sup>Φ</sup> <sup>¼</sup> <sup>P</sup> <sup>Γ</sup> Γ Γ . Those methods are also valid for the RGESIM. This means that the SVD methods described for estimating MESIM parameters are also valid for estimating RGESIM parameters.

### 8.4.3 Numerical Examples

With a selection intensity of 10% (kI ¼ 1.755), we compare the CRLGSI (for details see Chap. 6) versus the RGESIM theoretical results using a real maize (Zea mays) F2 population with 244 genotypes (each with two repetitions), 233 molecular markers, and three traits—GY (ton ha<sup>1</sup> ), EHT (cm), and PHT (cm)—described in Sect. 8.2.2, where Pb ¼ 0:45 1:33 2:33 1:33 65:07 83:71 2:33 83:71 165:99 2 4 3 5, Cb ¼ 0:07 0:61 1:06 0:61 17:93 22:75 1:06 22:75 44:53 2 4 3 5, and Γb ¼ 0:07 0:65 1:05 0:65 10:62 14:25 1:05 14:25 26:37 2 4 3 5 were the estimated matrices of P, C, and Γ

respectively.

We have indicated that the main difference between the RLPSI and the CRLGSI is the matrix U<sup>0</sup> <sup>C</sup>, on which we now need to impose two restrictions: one for the trait and another for its associated GEBV. Consider the data set described earlier and suppose that we restrict the trait GY (ton ha<sup>1</sup> ) and its associated GEBVGY; then, matrix U<sup>0</sup> <sup>C</sup> should be constructed as U<sup>0</sup> <sup>C</sup><sup>1</sup> <sup>¼</sup> <sup>100000</sup> <sup>000100</sup> . If we restrict traits GY and EHT (cm) and their associated GEBVGY and GEBVEHT, matrix U<sup>0</sup> C 100000 010000 2 6 3 7

should be constructed as U<sup>0</sup> <sup>C</sup><sup>2</sup> ¼ 000100 000010 6 4 7 5 , etc. The procedure for

obtaining matrices <sup>K</sup><sup>b</sup> RG <sup>¼</sup> <sup>I</sup>2<sup>t</sup> <sup>Q</sup><sup>b</sup> RG and <sup>Q</sup><sup>b</sup> RG <sup>¼</sup> <sup>Φ</sup><sup>b</sup> <sup>1</sup> AUb <sup>G</sup> U0 <sup>G</sup>AbΦ<sup>b</sup> <sup>1</sup> AUb <sup>G</sup> <sup>1</sup> U0 G Ab was described in Chap. 6, and is also valid for estimating RGESIM parameters.

The estimated CRLGSI vector of coefficients is <sup>β</sup>bCR <sup>¼</sup> <sup>K</sup><sup>b</sup> RGβb, where <sup>β</sup><sup>b</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> <sup>1</sup> Ab <sup>w</sup>is the estimated CLGSI vector of coefficients (Chap. 6). Let <sup>w</sup><sup>0</sup> <sup>¼</sup> [5 0.1 0.1 0 0 0] be the vector of economic weights and suppose that we restrict trait GY and its associated GEBVGY; in this case, U<sup>0</sup> <sup>C</sup><sup>1</sup> <sup>¼</sup> <sup>100000</sup> <sup>000100</sup> , and according to matrices Pb, Cb, and Γb described earlier, βb<sup>0</sup> CR ¼ ½ - 0:076 0:004 0:018 2:353 0:096 0:082 was the estimated CRLGSI vector of coefficients and the estimated CRLGSI was

$$\begin{array}{l} \widehat{I}\_{CR} = 0.076 \text{GY} - 0.004 \text{EHT} - 0.018 \text{PHT} + 2.353 \text{GEBV}\_{\text{GY}} - 0.096 \text{GEBV}\_{\text{EHT}} \\ - 0.082 \text{GEBV}\_{\text{PHT}} \end{array}$$

where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with the traits GY, EHT, and PHT respectively. The same procedure is valid for two or more restrictions.

The estimated CRLGSI selection response and expected genetic gain per trait were RbCR ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CRΦ<sup>b</sup> <sup>β</sup>bCR <sup>q</sup> ¼ 0:96 and Eb<sup>0</sup> CR <sup>¼</sup> kI <sup>¼</sup> <sup>β</sup>b<sup>0</sup> CRA<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CRΦ<sup>b</sup> <sup>β</sup>bCR <sup>q</sup> ½ - 0 3:53 6:03 0 2:93 4:87 respectively, whereas the estimated CRLGSI accuracy was <sup>b</sup>ρHlC<sup>R</sup> <sup>¼</sup> <sup>σ</sup>b<sup>I</sup>C<sup>R</sup> σbH ¼ 0:51. Note that in Eb<sup>0</sup> CR, the trait GY and its

associated GEBVGY have null values, as we would expect.

The estimated RGESIM vector of coefficients was <sup>β</sup>b<sup>0</sup> CR ¼ ½ - 0:015 0:001 0:004 0:998 0:029 0:045 , and the estimated RGESIM index was bI RG ¼ 0:015GY 0:001EHT 0:004PHT þ 0:998GEBVGY 0:029GEBVEHT 0:045GEBVPHT where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with traits GY, EHT, and PHT respectively. The same procedure is valid for two or more restrictions.

The estimated RGESIM selection response and expected genetic gain per trait were RbRG ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RGΦ<sup>b</sup> <sup>β</sup>bRG <sup>q</sup> ¼ 0:37 and Eb<sup>0</sup> RG <sup>¼</sup> kI <sup>¼</sup> <sup>β</sup>b<sup>0</sup> RGA<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 RGΦ<sup>b</sup> <sup>β</sup>bRG <sup>q</sup>

½ - 0 3:28 6:03 0 2:93 5:40 respectively, whereas the estimated σbbI RG

RGESIM accuracy was <sup>b</sup>ρ<sup>H</sup>bRGb<sup>I</sup> RG ¼ σbHbRG ¼ 0:86.

Fig. 8.3 presents the frequency distribution of the 244 estimated RGESIM index values for two null restrictions on traits GY and EHT and their associated GEBVGY and GEBVEHT, for one selection cycle in an environment for a real maize (Zea mays) F2 population with 233 molecular markers. Note that the frequency distribution of the estimated RGESIM index values approaches the normal distribution.

Now we compare the estimated CRLGSI and RGESIM selection responses and expected genetic gains per trait using the simulated data set described in Sect. 2.8.1 of Chap. 2. We used that data set for four phenotypic selection cycles (C2, C3, C4, and C5), each with four traits (T1, T2, T3, and T4), 500 genotypes, and four replicates per genotype. The economic weights for T1, T2, T3, and T<sup>4</sup> were 1, 1, 1, and

Fig. 8.3 Frequency distribution of the 244 estimated restricted genomic eigen selection index method (RGESIM) values for two null restrictions on traits grain yield (GY) and EHT and their associated genomic estimated breeding values (GEBVs), GEBVGY and GEBVEHT respectively, for one selection cycle in an environment for a real maize (Zea mays) F2 population with 233 molecular markers. Note that the frequency distribution of the estimated RGESIM index values approaches normal distribution

1 respectively. For this data set, matrix F was an identity matrix of size 8 8 for all four selection cycles.

Columns 2, 3, and 4 (from left to right) of Table 8.4 present the estimated CRLGSI selection responses when their vectors of coefficients are normalized and the estimated RGESIM and selection responses for one, two, and three restrictions for four simulated selection cycles. The averages of the estimated CRLGSI selection responses of the traits and their associated GEBVs for each of the three null restrictions were 3.24 for one restriction, 4.08 for two restrictions, and 5.06 for three restrictions, whereas the averages of the estimated RGESIM selection responses were 3.08 for one restriction, 2.79 for two restrictions, and 3.23 for three restrictions. Note that although for one restriction the selection response was similar for both indices, for two and three restrictions the CRLGSI selection responses were greater than the RGESIM selection responses.

Table 8.5 presents the estimated CRLGSI and RGESIM expected genetic gains per trait for four traits (T1, T2, T3, and T4) and their associated GEBVs (in this case denoted by G1, G2, G3, and G4 to simplify the notation) in four simulated selection cycles and for one, two, and three null restrictions in four simulated selection cycles. Note that the null values of the traits and their restricted GEBVs are not shown in Table 8.5 with the aim of simplifying the table. The averages of the estimated CRLGSI expected genetic gains for the three traits and their associated GEBVs were 2.60, 2.16, 2.84, 1.21, 0.67, and 1.02 for one restriction; 2.74, 3.23, 0.78,

Table 8.4 Estimated combined null restricted linear genomic selection index (CRLGSI) and estimated combined predetermined proportional gain linear genomic selection index (CPPG-LGSI) selection responses for one, two, and three restrictions when their vectors of coefficients are normalized for four simulated selection cycles


Estimated null restricted genomic eigen selection index method (RGESIM) and predetermined proportional gain genomic eigen selection index method (PPG-GESIM) selection responses for one, two, and three restrictions for four simulated selection cycles. The selection intensity was 10% (kI ¼ 1.755)

Table 8.5 Estimated CRLGSI and estimated null RGESIM expected genetic gains per trait for four traits (T1, T2, T3, and T4) and their associated genomic estimated breeding values (G1, G2, G3, and G4) for four simulated selection cycles and for one, two, and three null restrictions for four simulated selection cycles. The selection intensity was 10% (kI ¼ 1.755)


4 3.56 1.73 1.23 1.92 0.89 0.78 3.40 0.96 1.62 0.53 3.58 2.02 Mean 3.27 1.67 1.33 2.16 0.92 0.84 3.29 1.02 1.76 0.46 3.53 2.07

a All T1 and G1 expected genetic gains were null

b All T1, T2, G1, and G2 expected genetic gains were null

c All T1, T2, T3, G1, G2, and G3 expected genetic gains were null

and 0.99 for two restrictions; and 4.02 and 1.33 for three restrictions. On the other hand, the averages of the estimated RGESIM expected genetic gains for the three traits and their associated GEBVs were 3.27, 1.67, 1.33, 2.16, 0.92, and 0.84 for one restriction; 3.29, 1.02, 1.76, and 0.46 for two restrictions; and 3.53 and 2.07 for three restrictions. These results indicate that in terms of absolute values, the estimated expected genetic gains for the traits and their associated GEBVs were similar for both indices.

#### 8.5 The Predetermined Proportional Gain Linear Genomic Eigen Selection Index Method

The predetermined proportional gain linear genomic eigen selection index method (PPG-GESIM) theory is based on the predetermined proportional gain linear phenotypic ESIM (PPG-ESIM) described in Chap. 7. In the PPG-ESIM, the vector of PPG (predetermined proportional gain) imposed by the breeder was <sup>d</sup><sup>0</sup> <sup>¼</sup> d<sup>1</sup> d<sup>2</sup> dr . However, because the PPG-GESIM uses phenotypic and GEBV information jointly to predict the net genetic merit, the vector of PPG imposed by the breeder (dPG) should be twice the standard vector d<sup>0</sup> , that is, d0 PG <sup>¼</sup> d<sup>1</sup> d<sup>2</sup> dr drþ<sup>1</sup> drþ<sup>2</sup> d2<sup>r</sup> , where we would expect that if d<sup>1</sup> is the PPG imposed on trait 1, then dr + 1 should be the PPG imposed on the GEBV associated with trait 1, etc. Thus, in the PPG-GESIM we have three possible options for determining (for each trait and GEBV) the PPG: e.g., for trait 1, d<sup>1</sup> ¼ dr + 1, d<sup>1</sup> > dr + 1 or d<sup>1</sup> < dr + 1. This is the main difference between the standard PPG-ESIM described in Chap. 7 and the PPG-GESIM.

### 8.5.1 The PPG-GESIM Parameters

Using the same procedure described for RGESIM and PPG-ESIM, the PPG-GESIM vector of coefficients (βPG), which maximizes the PPG-GESIM selection response and the expected genetic gain per trait, is the first eigenvector of the following equation

$$(\mathbf{T}\_{PG} - \lambda\_{PG}^2 \mathbf{I}\_{2t})\mathbf{\hat{p}}\_{PG} = \mathbf{0},\tag{8.45}$$

where <sup>T</sup>PG <sup>¼</sup> <sup>K</sup>RGΦ<sup>1</sup> A + B, KPG ¼ [I2<sup>t</sup> QRG], I2<sup>t</sup> is an identity matrix of size 2<sup>t</sup> <sup>2</sup>t, <sup>Q</sup>RG <sup>¼</sup> <sup>Φ</sup><sup>1</sup> AU<sup>G</sup> U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> U0 <sup>G</sup>A, B ¼ δφ<sup>0</sup> , δ ¼ Φ<sup>1</sup>AU<sup>G</sup> U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> <sup>d</sup>PG, and <sup>φ</sup><sup>0</sup> <sup>¼</sup> <sup>d</sup><sup>0</sup> PG U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> U0 GAΦ<sup>1</sup> A d0 PG U<sup>0</sup> GAΦ<sup>1</sup> AU<sup>G</sup> <sup>1</sup> dPG .

When <sup>B</sup> is a null matrix, <sup>T</sup>PG <sup>¼</sup> <sup>K</sup>RGΦ<sup>1</sup> A (matrix of the RGESIM), and when U0 <sup>G</sup> is a null matrix, <sup>T</sup>PG <sup>¼</sup> <sup>Φ</sup><sup>1</sup> A (matrix of the GESIM); this means that the PPG-GESIM includes the RGESIM and GESIM as particular cases. The optimized PPG-GESIM index can be written as IPG ¼ β<sup>0</sup> PGf.

The vector of coefficients of <sup>H</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> PGα in the PPG-GESIM can be written as

$$\mathbf{w}\_{PG} = \mathbf{A}^{-1} \left[ \mathbf{\upphi} + \mathbf{Q}\_{PG}^{\prime} \mathbf{A} \right] \mathbf{\uphat{p}}\_{PG},\tag{8.46}$$

$$\begin{array}{llll}\text{where} & \mathbf{Q}'\_{PG} = \mathbf{A}\mathbf{U}\_{G}\mathbf{D}\_{G}\big(\mathbf{D}'\_{G}\mathbf{U}\_{G}^{\prime}\mathbf{A}\mathbf{D}^{-1}\mathbf{A}\mathbf{U}\_{G}\mathbf{D}\_{G}\big)^{-1}\mathbf{D}\_{G}^{\prime}\mathbf{U}\_{G}^{\prime}\mathbf{A}\mathbf{D}^{-1}, & \text{ and} \\ \mathbf{D}\_{G}^{\prime} = \begin{bmatrix} d\_{2r} & 0 & \cdots & 0 & -d\_{1} \\ 0 & d\_{2r} & \cdots & 0 & -d\_{2} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & d\_{2r} & -d\_{2r-1} \end{bmatrix}. \text{ Similar to RGBSIM, it can be shown that} \\ & & \\ & & \begin{array}{ll} \text{Minimal to RGBSIM, it can be shown that} \\ \dots \end{array}$$

the covariance between HRG ¼ w<sup>0</sup> PGα and IPG ¼ β<sup>0</sup> PGf ( σHPGIPG ) is equal to the variance of IPG ¼ β<sup>0</sup> PGf ( σ<sup>2</sup> IPG ¼ β<sup>0</sup> PGΦβPG ), that is, σHPGIPG ¼ w<sup>0</sup> PGAβPG ¼ β0 PGΦβPG <sup>¼</sup> <sup>σ</sup><sup>2</sup> IPG .

The maximized correlation between IPG and HPG, or PPG-GESIM accuracy, is

$$\rho\_{H\_{PG}I\_{PG}} = \frac{\sqrt{\mathfrak{B}\_{PG}^{\prime}\Phi\mathfrak{B}\_{PG}}}{\sqrt{\mathbf{w}\_{PG}^{\prime}\mathbf{A}\mathbf{w}\_{PG}}}\tag{8.47}$$

where w<sup>0</sup> PGAwPG is the variance of HPG. Hereafter, to simplify the notation, we write Eq. (8.47) as λPG.

The maximized selection response and the expected genetic gain per trait of the PPG-GESIM are

$$R\_{PG} = k\_I \sqrt{\mathfrak{B}\_{PG}' \mathfrak{sp} \mathfrak{f}\_{PG}} \tag{8.48}$$

and

$$\mathbf{E}\_{PG} = k\_I \frac{\mathbf{A} \mathfrak{B}\_{PG}}{\sqrt{\mathfrak{B}\_{PG}^{\prime} \mathfrak{A} \mathfrak{B} \mathfrak{B}\_{PG}}},\tag{8.49}$$

respectively, where βPG is the first eigenvector of Eq. (8.45).

### 8.5.2 Numerical Examples

The process for estimating PPG-ESIM parameters is similar to the method described for estimating RGESIM parameters. With a selection intensity of 10% (kI ¼ 1.755), we compare the combined predetermined proportional gain linear genomic selection index (CPPG-LGSI) and PPG-GESIM results using the real maize (Zea mays) F2 population with 244 genotypes, 233 molecular markers, and three traits—GY (ton ha<sup>1</sup> ), EHT (cm), and PHT

$$\begin{aligned} \text{(cm)---where } \hat{\mathbf{P}} = \begin{bmatrix} 0.45 & 1.33 & 2.33 \\ 1.33 & 65.07 & 83.71 \\ 2.33 & 83.71 & 165.99 \end{bmatrix}, \hat{\mathbf{G}} = \begin{bmatrix} 0.07 & 0.61 & 1.06 \\ 0.61 & 17.93 & 22.75 \\ 1.06 & 22.75 & 44.53 \end{bmatrix} \text{ and }\\ \hat{\mathbf{F}} = \begin{bmatrix} 0.07 & 0.65 & 1.05 \\ 0.65 & 10.67 & 14.25 \end{bmatrix} \text{ are the estimated matrices of } \mathbf{D} \text{ } G \text{ and } \mathbf{F} \text{ are:} \end{aligned}$$

Γb ¼ 0:65 10:62 14:25 1:05 14:25 26:37 4 5 are the estimated matrices of P, G, and Γ respec-

tively, whereas w<sup>0</sup> ¼ ½ - 5 0:1 0:1000 was the vector of economic weights.

The estimated CPPG-LGSI vector of coefficients was <sup>β</sup>bCP <sup>¼</sup> <sup>β</sup>bCG <sup>þ</sup> <sup>b</sup>θCPb<sup>δ</sup> (see Chap. <sup>6</sup> for additional details). Let <sup>A</sup><sup>b</sup> <sup>¼</sup> <sup>G</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb and <sup>Φ</sup><sup>b</sup> <sup>¼</sup> <sup>P</sup><sup>b</sup> <sup>Γ</sup><sup>b</sup> Γb Γb be the estimated block matrices and d<sup>0</sup> PG ¼ ½ - 7 3 3:5 1:5 the vector of PPG imposed by the breeder on the traits GY and EHT, and their associated genomic estimated breeding values (GEBVGY and GEBVEHT), and let

U0 C ¼ 100000 010000 000100 000010 2 6 6 4 3 7 7 5 be the matrix of null restrictions on the CPPG-LGSI

and w<sup>0</sup> ¼ ½ - 5 0:1 0:1000 the vector of economic weights. It can be shown that bθCP ¼ 0:00009 is the estimated value of the proportionality constant, bδ<sup>0</sup> ¼ ½ - 112:92 72:16 61:35 231:79 64:75 61:35 , βb<sup>0</sup> CP ¼ ½ - 0:01 0:01 0:01 0:59 0:09 0:09 is the estimated CPPG-LGSI vector of coefficients, and the estimated CPPG-LGSI can be written as

$$\begin{array}{l} \text{ $\tilde{I}\_{CP}$ } = -0.01 \text{GY} + 0.01 \text{EHT} - 0.01 \text{PHT} + 0.59 \text{GEBV}\_{\text{GY}} + 0.09 \text{GEBV}\_{\text{EHT}} \\ - 0.09 \text{GEBV}\_{\text{PHT}} \end{array}$$

where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with traits GY, EHT, and PHT respectively. The same procedure is valid for more than two predetermined restrictions. The estimated CPPG-LGSI selection response and expected genetic gain per trait were RbCP ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CPΦ<sup>b</sup> <sup>β</sup>bCP <sup>q</sup> ¼ 0:443 and Eb0 CP ¼ kI βb0 CPA<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 CPΦ<sup>b</sup> <sup>β</sup>bCP <sup>q</sup> ¼ ½ -0:004 0:002 4:639 0:002 0:001 4:326

respectively, whereas the estimated CPPG-LGSI accuracy is <sup>b</sup>ρHICP <sup>¼</sup> <sup>σ</sup>bICP σbH ¼ 0:234.

Because the estimated value of the proportionality constant was negative (bθCP ¼ 0:00009), the expected genetic gains of the traits GY and EHT, and their associated genomic estimated breeding values (GEBVGY and GEBVEHT), which appeared in theEb<sup>0</sup> CP values, were not in accordance with the values of the vector of PPG imposed by the breeder, d<sup>0</sup> PG ¼ ½ - 7 3 3:5 1:5 , as we would expect, and CPPG-LGSI accuracy (0.234) was low. These results indicate that in the CPPG-LGSI, it is very important for the estimated values of bθCP to be positive (see Chaps. 3 and 6 for details).

In the PPG-GESIM, we need to find the solutions to equation <sup>T</sup>bPG <sup>b</sup>λ<sup>2</sup> PG <sup>j</sup> I2t <sup>β</sup>bPG <sup>j</sup> <sup>¼</sup> <sup>0</sup>, for <sup>b</sup>λ<sup>2</sup> PG <sup>j</sup> and βbPG <sup>j</sup> (see Eq. 8.45). The estimated PPG-GESIM vector of coefficients wasβb<sup>0</sup> PG ¼ ½ - 0:001 0:050 0:029 0:975 0:154 0:157 , which was transformed using matrix F ¼ 0:100 0 0 0 0 30 0 0 0 0 02 0 0 0 0 00 10 0 0 00 0 1 0 0 00 0 0 1 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 , that is, we

changed the direction of the original vector. With the βb<sup>0</sup> PG values, we can estimate the PPG-GESIM index as

$$\begin{array}{l} \text{I}\_{\text{PG}} = 0.001 \text{GY} - 0.05 \text{EHT} + 0.029 \text{PHT} + 0.97 \text{5GEBV}\_{\text{GY}} + 0.154 \text{GEBV}\_{\text{EHT}}\\ - 0.157 \text{GEBV}\_{\text{PHT}} \end{array}$$

where GEBVGY, GEBVEHT, and GEBVPHT are the GEBVs associated with the traits GY, EHT, and PHT respectively. The estimated PPG-GESIM selection response, accuracy, and expected genetic gain per trait were RbPG ¼ kI ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 PGΦ<sup>b</sup> <sup>β</sup>bPG <sup>q</sup> <sup>¼</sup> <sup>0</sup>:696, <sup>b</sup>ρ<sup>H</sup>bPGb<sup>I</sup> PG ¼ σbbI PG σbHbPG ¼ 0:843, and Eb<sup>0</sup> PG ¼ kI βb0 PGA<sup>b</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb0 PGΦ<sup>b</sup> <sup>β</sup>bPG <sup>q</sup> <sup>¼</sup> 0:01 1:00 3:56 0 0:46 3:98 respectively.

½ -Fig. 8.4 presents the frequency distribution of the 244 estimated PPG-GESIM index values for two predetermined restrictions on the traits GY and EHT and their associated GEBVs (GEBVGY and GEBVEHT), for one selection cycle in an environment for a real maize (Zea mays) F2 population with 233 molecular markers. Note that the frequency distribution of the estimated PPG-GESIM index values approaches normal distribution.

Now, with a selection intensity of 10% (kI ¼ 1.755) and a vector of predetermined restrictions d<sup>0</sup> PG ¼ ½ - 7 353:5 1:5 2:5 , we compare the estimated CPPG-LGSI and PPG-GESIM selection responses and expected genetic gains per

Fig. 8.4 Frequency distribution of the 244 estimated predetermined proportional gain genomic eigen selection index method (PPG-GESIM) values for two predetermined restrictions on the traits GY and EHT and their associated GEBVs, GEBVGY and GEBVEHT, for one selection cycle in an environment for a real maize (Zea mays) F2 population with 233 molecular markers

trait using the simulated data set described in Sect. 2.8.1 of Chap. 2. Traits T1, T2, and T3 and their associated GEBVs (GEBV1, GEBV2, and GEBV3 respectively) were restricted, but trait T4 and its associated GEBV4 were not restricted. For this data set, matrix F was an identity matrix of size 8 8 for all four selection cycles.

Table 8.6 presents the estimated CPPG-LGSI selection responses when their vectors of coefficients are normalized, and the estimated PPG-GESIM selection responses for one, two, and three predetermined restrictions for four simulated selection cycles. The averages of the estimated CPPG-LGSI selection responses were 5.08 for one restriction, 3.42 for two restrictions, and 1.60 for three restrictions, whereas the averages of the estimated PPG-GESIM selection responses were 1.96 for one restriction, 4.14 for two restrictions, and 5.46 for three restrictions. For this data set, when the number of restrictions increases, the estimated CPPG-LGSI

Table 8.6 Estimated CPPG-LGSI expected genetic gains for one, two, and three restricted predetermined traits (T1, T2, and T3) and for one, two, and three restricted predetermined GEBVs (GEBV1, GEBV2, and GEBV3) for four simulated selection cycles


The selection intensity was 10% (kI ¼ 1.755) and the vector of predetermined restrictions was d0 PG ¼ ½ -7 353:5 1:5 2:5 . Trait T4 and its associated GEBV4 were not restricted

selection response tends to decrease, whereas the estimated PPG-GESIM selection response increases.

Tables 8.7 presents the estimated CPPG-LGSI and PPG-GESIM expected genetic gains for one, two, and three predetermined restrictions respectively, for four simulated selection cycles. The averages of the estimated CPPG-LGSI expected genetic gains for the four traits and their four associated GEBVs were 8.28, 4.12, 3.23, 2.23, 4.14, 2.26, 1.71, and 1.01 for one restriction; 8.43, 3.61, 3.28, 2.13, 4.22, 1.81, 1.72, and 0.93 for two restrictions; and 5.81, 2.49, 4.15, 2.26, 2.90, 1.24, 2.07, and 0.89 for three restrictions. On the other hand, the averages of the estimated PPG-GESIM expected genetic gains for the four traits and their four associated GEBVs were 6.97, 1.31, 1.78, 0.52, 5.64, 1.74, 1.75, and 0.58 for one restriction; 6.93, 2.73, 1.29, 0.85, 5.75, 2.55, 1.49, and 0.79 for two restrictions, and 8.12, 3.27, 2.99, 1.13, 2.19, 1.15, 1.30, and 0.45 for three

Table 8.7 Estimated PPG-GESIM expected genetic gains for one, two, and three restricted traits (T1, T2, and T3) and for one, two, and three restricted GEBVs (GEBV1, GEBV2, and GEBV3) for four simulated selection cycles


The selection intensity was 10% (kI ¼ 1.755) and the vector of predetermined restrictions was d0 PG ¼ ½ -7 353:5 1:5 2:5 . Trait T4 and its associated GEBV4 were not restricted

restrictions. These results indicate that the estimated CPPG-LGSI expected genetic gains for the four traits and their four associated GEBVs were generally higher than the estimated PPG-GESIM expected genetic gains for the four traits and their four associated GEBVs.

#### References

Crossa J, Cerón-Rojas JJ (2011) Multi-trait multi-environment genome-wide molecular marker selection indices. J Indian Soc Agric Stat 62(2):125–142

Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 9 Multistage Linear Selection Indices

Abstract Multistage linear selection indices select individual traits available at different times or stages and are applied mainly in animals and tree breeding, where the traits under consideration become evident at different ages. The main indices are: the unrestricted, the restricted, and the predetermined proportional gain selection index. The restricted and predetermined proportional gain indices allow null and predetermined restrictions to be imposed on the trait expected genetic gain (or multitrait selection response) values, whereas the rest of the traits remain changed without any restriction. The three indices can use phenotypic, genomic, or both sets of information to predict the unobservable net genetic merit values of the candidates for selection and all of them maximize the selection response, the expected genetic gain for each trait, have maximum accuracy, are the best predictor of the net genetic merit, and provide the breeder with an objective rule for evaluating and selecting several traits simultaneously. The theory of the foregoing indices is based on the independent culling method and on the linear phenotypic selection index, and is described in this chapter in the phenotypic and genomic selection context. Their theoretical results are validated in a two-stage breeding selection scheme using real and simulated data.

#### 9.1 Multistage Linear Phenotypic Selection Index

In a similar manner to the linear phenotypic selection index (LPSI, Chap. 2), the objectives of the multistage linear phenotypic selection index (MLPSI) are:


4. To provide the breeder with an objective rule for evaluating and selecting several traits simultaneously.

When selection is based on all the individual traits of interest jointly, the LPSI vector of coefficients that maximizes the selection response <sup>R</sup> ¼ <sup>k</sup> ffiffiffiffiffiffiffiffiffiffi b0 Pb p and the expected genetic gain per trait <sup>E</sup> ¼ <sup>k</sup> Cbffiffiffiffiffiffiffi b0 Pb <sup>p</sup> is <sup>b</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Cw, where C and P are the covariance matrices of the true breeding values (g) and trait phenotypic values (y) respectively, and k is the selection intensity. In MLPSI terminology, the LPSI is called a one-stage selection index. The MLPSI is an extension of the LPSI theory to the multistage selection context and, as we shall see, the MLPSI theoretical results are very similar to the LPSI theoretical results described in Chap. 2.

### 9.1.1 The MLPSI Parameters for Two Stages

Let <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> yt ½ be a vector with t traits of interest and suppose that we can select only ni of them (ni <sup>&</sup>lt; <sup>t</sup>) at stage <sup>i</sup> (i<sup>¼</sup> 1, 2, , <sup>N</sup>), such that after <sup>N</sup> stages (<sup>N</sup> <sup>&</sup>lt; <sup>t</sup>), ∑ N i¼1 ni <sup>¼</sup> <sup>t</sup>. Thus, for each stage we should have a selection index with a different number of traits. For example, at stage <sup>i</sup> the index would be Ii <sup>¼</sup> <sup>∑</sup> ni bijyij, and at

j¼1 stage <sup>N</sup> the index would be IN <sup>¼</sup> <sup>∑</sup> n1 j¼1 <sup>b</sup><sup>1</sup> jy<sup>1</sup> <sup>j</sup> <sup>þ</sup> <sup>∑</sup> n2 j¼1 <sup>b</sup><sup>2</sup> jy<sup>2</sup> <sup>j</sup> þþ <sup>∑</sup> nN j¼1 bNjyNj <sup>¼</sup> <sup>∑</sup> N i¼1 Ii, where the double subscript of yij indicates that the jth trait is measured at stage i, so that at each sub-index Ii, all the ni traits are measured at the same age.

Suppose that there are four traits of interest and that <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> ½ is the vector of observable phenotypic values and <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> <sup>g</sup><sup>3</sup> <sup>g</sup><sup>4</sup> ½ is the vector of unobservable breeding values. If at the first and second stages we select two traits, then <sup>n</sup><sup>1</sup> <sup>¼</sup> <sup>n</sup><sup>2</sup> <sup>¼</sup> 2 and <sup>y</sup><sup>0</sup> can be partitioned as <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>x</sup><sup>0</sup> <sup>1</sup> x<sup>0</sup> <sup>2</sup> ½ -, where x<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> ½ - and x<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> ½ are the vectors of traits that become evident at the first and second stages respectively. At the first stage, the phenotypic covariance matrix of x<sup>1</sup> (P1) and the covariance matrix of x<sup>1</sup> with the vector of true breeding values g (G1) can be written as Varð Þ¼ <sup>x</sup><sup>1</sup> Var y<sup>1</sup> ð Þ Cov y1; <sup>y</sup><sup>2</sup> ð Þ Cov y2; <sup>y</sup><sup>1</sup> ð Þ Var y<sup>2</sup> ð Þ <sup>¼</sup> <sup>P</sup><sup>1</sup> and

$$\text{Cov}(\mathbf{x}\_{1}, \mathbf{g}) = \begin{bmatrix} \text{Cov}(\mathbf{y}\_{1}, \mathbf{g}\_{1}) & \text{Cov}(\mathbf{y}\_{1}, \mathbf{g}\_{2}) & \text{Cov}(\mathbf{y}\_{1}, \mathbf{g}\_{3}) & \text{Cov}(\mathbf{y}\_{1}, \mathbf{g}\_{4}) \\ \text{Cov}(\mathbf{y}\_{2}, \mathbf{g}\_{1}) & \text{Cov}(\mathbf{y}\_{2}, \mathbf{g}\_{2}) & \text{Cov}(\mathbf{y}\_{2}, \mathbf{g}\_{3}) & \text{Cov}(\mathbf{y}\_{2}, \mathbf{g}\_{4}) \end{bmatrix} = \mathbf{G}\_{1}$$

respectively. For the second stage, in addition to matrix P1, we need the phenotypic covariance matrix between x<sup>1</sup> and x<sup>2</sup> (P12) and the phenotypic covariance matrix of x<sup>2</sup> (P2); thus, the covariance matrix of phenotypic values at stage 2 is <sup>P</sup> ¼ <sup>P</sup><sup>1</sup> <sup>P</sup><sup>12</sup> P<sup>21</sup> P<sup>2</sup> . In a similar manner, in addition to matrix <sup>G</sup>1, at stage 2 we need the covariance between x<sup>2</sup> and g (G2); that is, at stage 2 the covariance matrix between phenotypic and breeding values can be written as <sup>G</sup> ¼ <sup>G</sup><sup>1</sup> G<sup>2</sup> . Matrices <sup>G</sup> and <sup>C</sup> are not exactly the same, because although <sup>C</sup> ¼ Var(g), <sup>G</sup> ¼ Covð Þ <sup>x</sup>1; <sup>g</sup> Covð Þ <sup>x</sup>2; <sup>g</sup> ¼ G<sup>1</sup> and this latter matrix changes at each stage.

G<sup>2</sup> Let <sup>w</sup><sup>0</sup> <sup>¼</sup> ½ w<sup>1</sup> w<sup>2</sup> w<sup>3</sup> w<sup>4</sup> be the vector of economic weights; then, at the first and second stages the MLPSI vectors of coefficients are b<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 1P<sup>1</sup> <sup>1</sup> ¼ ½ b<sup>11</sup> b<sup>12</sup> and b<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 <sup>P</sup><sup>1</sup> ¼ ½ b<sup>21</sup> b<sup>22</sup> b<sup>23</sup> b<sup>24</sup> respectively. The selection indices at stages 1 and 2 can be written as <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>11y<sup>1</sup> <sup>þ</sup> <sup>b</sup>12y<sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x<sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>21y<sup>1</sup> <sup>þ</sup> <sup>b</sup>22y<sup>2</sup> <sup>þ</sup> <sup>b</sup>23y<sup>3</sup> <sup>þ</sup> <sup>b</sup>24y<sup>4</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>2</sup>y, which could be correlated and then numerical integration would be required to find optimal truncation points and selection intensities (Xu and Muir 1992; Hicks et al. 1998) before obtaining the maximized MLPSI selection response and expected genetic gain per trait.

The accuracy of the MLPSI at stages 1 and 2 can be written as

$$
\rho\_{Hl\_1} = \sqrt{\frac{\mathbf{b}\_1' \mathbf{P}\_1 \mathbf{b}\_1}{\mathbf{w}' \mathbf{C} \mathbf{w}}} \quad \text{and} \quad \rho\_{Hl\_2} = \sqrt{\frac{\mathbf{b}\_2' \mathbf{P}^\* \mathbf{b}\_2}{\mathbf{w}' \mathbf{C}^\* \mathbf{w}}}, \tag{9.1}
$$

respectively. Let k<sup>1</sup> and k<sup>2</sup> be the selection intensities for stages 1 and 2; then, the maximized MLPSI expected genetic gains per trait can be written as

$$\mathbf{E}\_1 = k\_1 \frac{\mathbf{G}\_1^{\prime} \mathbf{b}\_1}{\sqrt{\mathbf{b}\_1^{\prime} \mathbf{P}\_1 \mathbf{b}\_1}} \quad \text{and} \quad \mathbf{E}\_2 = k\_2 \frac{\mathbf{b}\_2^{\prime} \mathbf{C}^\*}{\sqrt{\mathbf{b}\_2^{\prime} \mathbf{P}^\* \mathbf{b}\_2}}, \tag{9.2}$$

and the total expected genetic gain per trait for the two stages is equal to E<sup>1</sup> + E2. In a similar manner, the maximized selection responses for both stages are

$$R\_1 = k\_1 \sqrt{\mathbf{b}\_1' \mathbf{P}\_1 \mathbf{b}\_1} \quad \text{and} \quad R\_2 = k\_2 \sqrt{\mathbf{b}\_2' \mathbf{P}^\* \mathbf{b}\_2},\tag{9.3}$$

and the total selection response for the two stages is R<sup>1</sup> + R2. In Eqs. (9.1) to (9.3), matrices P<sup>∗</sup> and C<sup>∗</sup> are matrices P and C respectively, adjusted for previous selection on <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x1. That is, the MLPSI accuracy, expected genetic gain per trait, and selection response at stage 2 are affected by previous selection on I<sup>1</sup> (Saxton 1983) and it is necessary to adjust P and C.

One method for adjusting matrices P and C has been provided by Cochran (1951) and Cunningham (1975). Suppose that X, Y, and W are three jointly normally distributed random variables and that the covariance among them is known, then the covariance between X and Y adjusted for the effects of selection on W can be obtained as

$$\text{Cov}(X, Y)^{\*} = \text{Cov}(X, Y) - \mu \frac{\text{Cov}(X, W)\text{Cov}(Y, W)}{\text{Var}(W)},\tag{9.4}$$

where <sup>u</sup> <sup>¼</sup> <sup>k</sup>1(k<sup>1</sup> <sup>τ</sup>), <sup>k</sup><sup>1</sup> is the selection intensity at stage 1 and <sup>τ</sup> is the truncation point when <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x<sup>1</sup> is applied. For example, if the selection intensity at the first stage is 5%, <sup>k</sup><sup>1</sup> <sup>¼</sup> 2.063, <sup>τ</sup> <sup>¼</sup> 1.645, and <sup>u</sup> <sup>¼</sup> 0.862 (Falconer and Mackay 1996, Table A).

According to Dekkers (2014), with the result of Eq. (9.4), it is possible to obtain matrices P<sup>∗</sup> and C<sup>∗</sup> using the following two equations:

$$\begin{split} \mathbf{P}^\* &= Var(\mathbf{y})^\* = \mathbf{P} - \mu \frac{Cov(\mathbf{y}, \mathbf{x}\_1) \mathbf{b}\_1 \mathbf{b}\_1' Cov(\mathbf{x}\_1, \mathbf{y})}{\mathbf{b}\_1' Var(\mathbf{x}\_1) \mathbf{b}\_1} \\ &= \mathbf{P} - \mu \frac{\begin{bmatrix} \mathbf{P}\_1 \\ \mathbf{P}\_{21} \end{bmatrix} \mathbf{b}\_1 \mathbf{b}\_1' \begin{bmatrix} \mathbf{P}\_1 & \mathbf{P}\_{21} \end{bmatrix}}{\mathbf{b}\_1' \mathbf{P}\_1 \mathbf{b}\_1} \end{split} \tag{9.5}$$

and

$$\mathbf{C}^\* = Var(\mathbf{g})^\* = \mathbf{C} - \mu \frac{Cov(\mathbf{g}, \mathbf{x}\_1) \mathbf{b}\_1 \mathbf{b}\_1' Cov(\mathbf{x}\_1, \mathbf{g})}{\mathbf{b}\_1' Var(\mathbf{x}\_1) \mathbf{b}\_1} = \mathbf{C} - \mu \frac{\mathbf{G}\_1' \mathbf{b}\_1 \mathbf{b}\_1' \mathbf{G}\_1}{\mathbf{b}\_1' \mathbf{P}\_1 \mathbf{b}\_1}. \quad (9.6)$$

With the Eq. (9.5) result, the correlation between <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x<sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>2</sup>y is

$$Corr(I\_1, I\_2) = \frac{\mathbf{b}\_1^\prime[\mathbf{P}\_1 \quad \mathbf{P}\_{21}]\mathbf{b}\_2}{\sqrt{\mathbf{b}\_1^\prime \mathbf{P}\_1 \mathbf{b}\_1}\sqrt{\mathbf{b}\_2^\prime \mathbf{P} \mathbf{b}\_2}} = \rho\_{12},\tag{9.7}$$

where ffiffiffiffiffiffiffiffiffiffiffiffiffiffi b0 <sup>1</sup>P1b<sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffiffi b0 <sup>2</sup>Pb<sup>2</sup> q are the standard deviations of the variances of <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x<sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>2</sup>y respectively.

### 9.1.2 The Selection Intensities

Selection intensity k is related to the height of the ordinate of the normal curve (z) and the proportion selected ( <sup>p</sup>) in the LPSI as <sup>k</sup> ¼ <sup>z</sup>/p. In the multistage selection context, it is usual to fix the total proportion to be selected ( p) before selection is carried out and then to determine the unknown proportion qi (i¼1, 2,, <sup>N</sup>) for each stage under the restriction

$$p = \prod\_{i=1}^{N} q\_i,\tag{9.8}$$

where N is the number of stages. In the two-stage selection scheme, we would have <sup>p</sup> <sup>¼</sup> <sup>q</sup>1q2. Based on the fixed proportion <sup>p</sup> and the <sup>ρ</sup><sup>12</sup> value (Eq. 9.7), Young (1964) used the bivariate truncated normal distribution theory to obtain the selection intensity for two stages. A truncated distribution is a conditional distribution resulting when the domain of the parent distribution is restricted to a smaller region (Hattaway 2010). In the multistage selection context, a truncation occurs when a sample of individuals from the parent distribution are selected as parents for the next selection cycle, thus creating a new population of individuals that follow a truncated normal distribution.

Suppose that <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>1</sup>x<sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> <sup>2</sup>y have joint normal distribution and let I<sup>1</sup> and <sup>I</sup><sup>2</sup> be transformed as <sup>v</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup>1μI<sup>1</sup> σI1 and <sup>v</sup><sup>2</sup> <sup>¼</sup> <sup>I</sup>2μI<sup>2</sup> σI2 with a mean of zero and a variance of 1, where μ<sup>I</sup><sup>2</sup> and μ<sup>I</sup><sup>2</sup> are the means, whereas σ<sup>I</sup><sup>1</sup> and σ<sup>I</sup><sup>2</sup> are the standard deviations of the variances of I<sup>1</sup> and I<sup>2</sup> respectively. In this case, the method of selection is to retain animals or plants with <sup>v</sup><sup>1</sup> <sup>c</sup><sup>1</sup> at stage 1 and <sup>v</sup><sup>1</sup> <sup>+</sup> <sup>v</sup><sup>2</sup> <sup>c</sup><sup>2</sup> at stage 2, where <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> are truncation points for I<sup>1</sup> and I<sup>2</sup> respectively.

The selected population has bivariate left truncated normal distribution with a probability density function given by h vð Þ¼ <sup>1</sup>; <sup>v</sup><sup>2</sup> f vð Þ <sup>1</sup>;v<sup>2</sup> <sup>p</sup> , where f vð Þ¼ <sup>1</sup>; <sup>v</sup><sup>2</sup> 1 2π ffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>1</sup> <sup>ρ</sup><sup>2</sup> <sup>12</sup> <sup>p</sup> exp <sup>1</sup> 2 1 <sup>ρ</sup><sup>2</sup> <sup>12</sup> <sup>v</sup><sup>2</sup> <sup>1</sup> <sup>þ</sup> <sup>v</sup><sup>2</sup> <sup>2</sup> <sup>2</sup>ρ12v1v<sup>2</sup> ( ) and <sup>ρ</sup><sup>12</sup> is the correlation between v<sup>1</sup> and v2. The fixed total proportion ( p) before selection can be written as <sup>p</sup> ¼ ð1 c1 ð1 <sup>c</sup>2v<sup>1</sup> f vð Þ <sup>1</sup>; <sup>v</sup><sup>2</sup> dv2dv1, where <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> are truncation points for I<sup>1</sup> and I2, respectively. Then, as p is fixed, Young (1964) integrated by parts (Thomas 2014)

$$\int\_{c\_1}^{\infty} \int\_{c\_2 - \nu\_1}^{\infty} f(\nu\_1, \nu\_2) d\nu\_1 d\nu\_2 \tag{9.9}$$

and found the expectations of v<sup>1</sup> and v<sup>2</sup> in the selected population, writing the selection intensity values for stages 1 (k1) and 2 (k2) as

$$k\_1 = \frac{z(c\_1)\mathcal{Q}(a)}{p} + \frac{z(c\_3)\mathcal{Q}(b)\sqrt{(1+\rho\_{12})/2}}{p} \tag{9.10}$$

and

$$k\_2 = \frac{\rho\_{12} z(c\_1) \mathcal{Q}(a)}{p} + \frac{z(c\_3) \mathcal{Q}(b) \sqrt{(1 + \rho\_{12})/2}}{p} \tag{9.11}$$

respectively, where z cð Þ¼ <sup>1</sup> exp 0:5c<sup>2</sup> 1 ffiffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> and z cð Þ¼ <sup>3</sup> exp 0:5c<sup>2</sup> 3 ffiffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> are the heights of the ordinates of the standard normal distribution at the lowest value of c<sup>1</sup> and <sup>c</sup><sup>3</sup> <sup>¼</sup> <sup>c</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1þρ<sup>12</sup> ð Þ <sup>p</sup> and <sup>p</sup> is the total proportion of the population of animal or plant lines selected; <sup>a</sup> ¼ <sup>c</sup><sup>2</sup> <sup>c</sup><sup>1</sup> <sup>1</sup> <sup>þ</sup> <sup>ρ</sup><sup>12</sup> ð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>1</sup> <sup>ρ</sup><sup>2</sup> <sup>12</sup> <sup>p</sup> and <sup>b</sup> <sup>¼</sup> <sup>2</sup>c1c<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1ρ<sup>12</sup> ð Þ <sup>p</sup> , whereas <sup>Q</sup>(a) <sup>¼</sup> <sup>1</sup> <sup>Φ</sup>(a) and <sup>Q</sup>(b) ¼ <sup>1</sup> <sup>Φ</sup>(b) are the complement of the standard normal distribution; <sup>Φ</sup>ð Þ¼ <sup>a</sup> ð a 1 1 ffiffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> exp 0:5w<sup>2</sup> dw and <sup>Φ</sup>ð Þ¼ <sup>b</sup> ð b 1 1 ffiffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> exp 0:5<sup>t</sup> <sup>2</sup> dt are probabilities of the standard normal distribution, i.e., <sup>Φ</sup>(a) ¼ Pr(<sup>W</sup> <sup>a</sup>) and <sup>Φ</sup>(b) <sup>¼</sup> Pr(<sup>T</sup> <sup>b</sup>).

Young (1964) provided figures to obtain values of c<sup>1</sup> and c<sup>2</sup> when the ρ<sup>12</sup> values are between 0.8 and 0.8, and the <sup>p</sup> values are between 0.05 and 0.8. For example, suppose that <sup>ρ</sup><sup>12</sup> <sup>¼</sup> 0.8 and <sup>p</sup> <sup>¼</sup> 0.2 (or 20%), then, according to Young (1964, Fig. 9), <sup>c</sup><sup>1</sup> <sup>¼</sup> 0.80 and <sup>c</sup><sup>2</sup> <sup>¼</sup> 1.6, and to find the selection intensities for the first (k1) and second stages (k2) we need to solve Eqs. (9.10) and (9.11). That is, as <sup>c</sup><sup>1</sup> <sup>¼</sup> 0.80, <sup>c</sup><sup>2</sup> <sup>¼</sup> 1.6, <sup>ρ</sup><sup>12</sup> <sup>¼</sup> 0.8, and <sup>p</sup> <sup>¼</sup> 0.2, then z cð Þ¼ <sup>1</sup> exp 0:5 0ð Þ :<sup>8</sup> <sup>2</sup> f g ffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:290, z cð Þ¼ <sup>3</sup> exp 0:5 1ð Þ :<sup>6</sup> <sup>2</sup> f g ½ - <sup>=</sup>2 1ð Þ :<sup>8</sup> ffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:28, <sup>a</sup> <sup>¼</sup> <sup>1</sup>:60:8 1ð Þ :<sup>8</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>1</sup>ð Þ <sup>0</sup>:<sup>8</sup> <sup>2</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:27, <sup>b</sup> <sup>¼</sup> 2 0ð Þ :<sup>8</sup> <sup>1</sup>:<sup>6</sup> ffiffiffiffiffiffiffiffiffi 2 0ð Þ :<sup>2</sup> <sup>p</sup> <sup>¼</sup> 0, <sup>Φ</sup>(a) ¼ 0.6064, <sup>Φ</sup>(b) ¼ 0.5, <sup>Q</sup>(a) ¼ <sup>1</sup> <sup>Φ</sup>(a) ¼ 0.3936, and <sup>Q</sup>(b) ¼ <sup>1</sup> <sup>Φ</sup>(b) ¼ 0.5. Based on these results, the selection intensities for stages 1 and 2 are

$$k\_1 = \frac{(0.29)(0.3936)}{0.2} + \frac{(0.28)(0.5)(0.9)}{0.2} = 0.744 \quad \text{and}$$

$$k\_2 = \frac{(0.8)(0.29)(0.3936)}{0.2} + \frac{(0.28)(0.5)(0.9)}{0.2} = 0.721$$

respectively. Note that the values of <sup>Φ</sup>(a) ¼ 0.6064 and <sup>Φ</sup>(b) ¼ 0.5 can be obtained from any table with values showing the area under the curve of the standard normal distribution (e.g., Rausand and Hϕyland 2004, Table F.1).

One problem with Eqs. (9.10) and (9.11) is that they tend to overestimate the selection intensities values and also overestimate the selection response when the total proportion retained p is lower than 10%. Cochran (1951) have given two equations to obtain selection intensities in the two stages context but his equations also overestimate the selection intensities values when p is lower than 10%. Up to now, there is not an accurate method to estimate selection intensities for two or more stages in the MLPSI context. Mi et al. (2014) have developed an R package called selectiongain that enables calculation of the OMLPSI selection response for up to 20 selection stages. Selectiongain uses raw integration to obtain the first moment of a lower truncated multivariate standard normal distribution and then it estimates the OMLPSI selection response at each stage; however, this integral requires complex numerical algorithms with no convergence criteria (Arismendi 2013) and could also overestimate the selection intensity at each stage.

### 9.1.3 Numerical Example

To illustrate the two-stage selection theory, we use the poultry data of Xu and Muir (1992). This data set contains four traits: age at sexual maturity, defined as the age (in days) at which the first trap-nested egg was laid (y1); rate of lay, defined as 100 times (total eggs in the laying period)/(total days in the laying period) (y2); body weight (in pounds) measured at 32 weeks of age (y3); and average egg weight (in ounces per dozen) of all the eggs laid up to 32 weeks of age (y4). The estimated phenotypic and

genetic covariance matrices were <sup>P</sup><sup>b</sup> ¼ <sup>137</sup>:<sup>178</sup> 90:957 0:136 0:<sup>564</sup> 90:957 201:558 1:<sup>103</sup> 1:<sup>231</sup> 0:136 1:103 0:202 0:104 <sup>0</sup>:<sup>564</sup> 1:231 0:104 2:<sup>874</sup> 2 6 6 4 3 7 7 5 and <sup>C</sup><sup>b</sup> ¼ <sup>14</sup>:<sup>634</sup> 18:<sup>356</sup> 0:109 1:<sup>233</sup> 18:356 32:029 0:<sup>103</sup> 2:<sup>574</sup> 0:109 0:103 0:089 0:<sup>023</sup> <sup>1</sup>:<sup>233</sup> 2:574 0:023 1:<sup>225</sup> 2 6 6 4 3 7 7 5 respectively, whereas

the vector of economic weights for the four traits was <sup>w</sup><sup>0</sup> <sup>¼</sup>

½ - 3:555 19:<sup>536</sup> 113:746 48:<sup>307</sup> . Suppose that at the first and second stages we select two traits (n<sup>1</sup> <sup>¼</sup> <sup>n</sup><sup>2</sup> <sup>¼</sup> 2); then, <sup>y</sup><sup>0</sup> ¼ <sup>x</sup><sup>0</sup> <sup>1</sup> x<sup>0</sup> <sup>2</sup> ½ -, where x<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> ½ and x<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> ½ -. The estimated phenotypic ( <sup>P</sup>b<sup>1</sup> ) and genetic ( <sup>G</sup>b<sup>1</sup> ) covariance matrices for the first stage were <sup>P</sup>b<sup>1</sup> <sup>¼</sup> <sup>137</sup>:<sup>178</sup> 90:<sup>957</sup> 90:957 1:<sup>103</sup> and <sup>G</sup>b<sup>1</sup> <sup>¼</sup> <sup>14</sup>:<sup>634</sup> 18:<sup>356</sup> 0:109 1:<sup>233</sup> 18:356 32:029 0:<sup>103</sup> 2:<sup>574</sup>

respectively. For the first and second stages, the estimated MLPSI vector of coefficients were <sup>b</sup>b<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Gb0 <sup>1</sup>Pb<sup>1</sup> ¼ ½ - 0:918 2:339 and bb<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup>b<sup>0</sup> <sup>C</sup>bPb<sup>1</sup> ¼ 0:59 2:<sup>78</sup> 49:45 3:<sup>75</sup> respectively.

½ -The estimated correlation value between the estimated indices <sup>b</sup><sup>I</sup> <sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>1</sup>x<sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>2</sup><sup>y</sup> was <sup>b</sup>ρ<sup>12</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>1</sup> <sup>P</sup>b<sup>1</sup> <sup>P</sup>b<sup>21</sup> bb<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>1</sup>Pb1bb<sup>1</sup> q ffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>b</sup>b0 <sup>2</sup>Pbbb<sup>2</sup> <sup>q</sup> ¼ <sup>0</sup>:88, where ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>1</sup>Pb1bb<sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>2</sup>Pbbb<sup>2</sup> q

were the estimated standard deviations of the variance of bI <sup>1</sup> and bI <sup>2</sup> respectively. Assuming that <sup>p</sup> ¼ 0.2 (or 20%), an approximate selection intensity for the first stage was <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744, whence the estimated MLPSI selection response, expected genetic gain per trait, and accuracy were <sup>R</sup>b<sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>1</sup>Pb1bb<sup>1</sup> q ¼ <sup>29</sup>:85, <sup>b</sup> E0 <sup>1</sup> ¼ k1 Gb0 1 <sup>b</sup><sup>b</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffi bb0 1 b P1 bb1 q ¼ ½ - <sup>1</sup>:046 1:702 0:<sup>006</sup> 0:<sup>133</sup> , and <sup>b</sup>ρHI<sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>1</sup>Pb1bb<sup>1</sup> w0 Cwb s ¼ <sup>0</sup>:<sup>353</sup>

respectively.

According to the <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744 value, the approached value of <sup>u</sup> was <sup>u</sup> ¼ 0.554, and by Eqs. (9.5) and (9.6), the estimated and adjusted phenotypic (Pb<sup>∗</sup>) and genetic (Cb<sup>∗</sup>) covariance matrices for the second stage were

$$
\hat{\mathbf{P}}^{\*} = \begin{bmatrix}
97.682 & -26.241 & 0.422 & 0.168 \\
0.422 & 0.634 & 0.200 & 0.107 \\
0.168 & -0.582 & 0.107 & 2.870
\end{bmatrix} \text{ and}
$$

$$
\hat{\mathbf{C}}^{\*} = \begin{bmatrix}
13.540 & -16.575 & -0.102 & 1.094 \\
1.094 & -2.384 & 0.024 & 1.207 \\
\end{bmatrix}, \text{ respectively.}
$$

$$
\text{For the second stage, the approximated selection intensity,}
$$

For the second stage, the approximated selection intensity was <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.721, whereas the estimated MLPSI selection response, expected genetic gain per trait and accuracy, were <sup>R</sup>b<sup>2</sup> <sup>¼</sup> kI<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 2Pb<sup>∗</sup> <sup>2</sup> bb<sup>2</sup> q ¼ <sup>24</sup>:84, <sup>E</sup>b<sup>0</sup> <sup>2</sup> <sup>¼</sup> kI<sup>2</sup> Cb∗0 <sup>b</sup>b<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 2Pb<sup>∗</sup> <sup>2</sup> bb<sup>2</sup> q ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 2Pb<sup>∗</sup> <sup>2</sup> bb<sup>2</sup> s

½ - 0:443 0:<sup>804</sup> 0:<sup>087</sup> 0:<sup>087</sup> , and <sup>b</sup>ρHI<sup>2</sup> <sup>¼</sup> w0 Cb<sup>∗</sup>w ¼ <sup>0</sup>:314 respectively. Finally, the total estimated MLPSI selection response and expected genetic gain per trait were <sup>R</sup>b<sup>1</sup> <sup>þ</sup> <sup>R</sup>b<sup>2</sup> <sup>¼</sup> <sup>54</sup>:69 and <sup>E</sup>b<sup>0</sup> <sup>1</sup> <sup>þ</sup> <sup>E</sup>b<sup>0</sup> <sup>2</sup> ¼ ½ -1:488 2:<sup>506</sup> 0:<sup>081</sup> 0:<sup>219</sup> .

#### 9.2 The Multistage Restricted Linear Phenotypic Selection Index

The multistage restricted linear phenotypic selection index (MRLPSI) is an extension of the null restricted linear phenotypic selection index (RLPSI) described in Chap. 3 to the multistage case; thus, the theoretical results of the MRLPSI are very similar to those of the RLPSI. The MRLPSI allows restrictions equal to zero to be imposed on the expected genetic gains of some traits, whereas other traits increase (or decrease) their expected genetic gains without any restrictions being imposed.

### 9.2.1 The MRLPSI Parameters for Two Stages

In Chap. 3, we indicated that vector <sup>b</sup><sup>R</sup> <sup>¼</sup> Kb is a linear transformation of the LPSI vector of coefficients (b) made by the projector matrix K, and that matrix K is idempotent (<sup>K</sup> ¼ <sup>K</sup><sup>2</sup> ) and projects b into a space smaller than the original space of b. The reduction of the space into which matrix K projects b is equal to the number of zeros that appears on the expected genetic gain per trait. Hence, the MRLPSI vector of coefficients for stages 1 and 2 should be a linear transformation of the MLPSI vector of coefficients at stages 1 (b<sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> <sup>G</sup>1w) and 2 (b<sup>2</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Cw) described in Sect. 9.1.1 of this chapter, and should be written as

$$\mathbf{b}\_{\mathcal{R}\_{\mathrm{l}}} = \mathbf{K}\_{\mathrm{l}} \mathbf{b}\_{\mathrm{l}} \tag{9.12}$$

and

$$\mathbf{b}\_{R\_2} = \mathbf{K}\_2 \mathbf{b}\_2,\tag{9.13}$$

respectively, where, at stage 1, <sup>K</sup><sup>1</sup> <sup>¼</sup> [I<sup>1</sup> <sup>Q</sup>1], <sup>Q</sup><sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> Ψ<sup>1</sup> Ψ<sup>0</sup> 1P<sup>1</sup> <sup>1</sup> Ψ<sup>1</sup> <sup>1</sup> Ψ0 1, Ψ0 <sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> G0 1, I<sup>1</sup> is an identity matrix of the same size as P1, and P<sup>1</sup> <sup>1</sup> is the inverse of matrix <sup>P</sup>1. At stage 2, <sup>K</sup><sup>2</sup> <sup>¼</sup> [I<sup>2</sup> <sup>Q</sup>2], <sup>Q</sup><sup>2</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Ψ<sup>2</sup> Ψ<sup>0</sup> 2P<sup>1</sup> Ψ<sup>2</sup> <sup>1</sup> Ψ0 2, Ψ<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> C, I<sup>2</sup> is an identity matrix of the same size as P, and P<sup>1</sup> is the inverse of matrix P. By Eqs. (9.12) and (9.13), the MRLPSI for stages 1 and 2 can be written as <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> R1 x1 and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> R2 <sup>y</sup>, where <sup>y</sup><sup>0</sup> ¼ <sup>x</sup><sup>0</sup> <sup>1</sup> x<sup>0</sup> <sup>2</sup> ½ -; x<sup>0</sup> <sup>1</sup> and x<sup>0</sup> <sup>2</sup> are the vectors of traits that become evident at the first and second stages respectively.

Let k<sup>1</sup> and k<sup>2</sup> be the selection intensities for stages 1 and 2 (Eqs. 9.10 and 9.11) respectively, and let P<sup>∗</sup> and C<sup>∗</sup> be the covariance matrices adjusted in the MRLPSI context according to Eqs. (9.5) and (9.5) respectively. The maximized MRLPSI selection response, expected genetic gain per trait, and accuracy at stages 1 and 2 can be written as

$$R\_{R\_1} = k\_1 \sqrt{\mathbf{b}\_{R\_1}^{\prime} \mathbf{P}\_1 \mathbf{b}\_{R\_1}} \quad \text{and} \quad R\_{R\_1} = k\_2 \sqrt{\mathbf{b}\_{R\_2}^{\prime} \mathbf{P}^\* \mathbf{b}\_{R\_2}},\tag{9.14}$$

$$\mathbf{E}\_{R\_1} = k\_1 \frac{\mathbf{G}\_1' \mathbf{b}\_{R\_1}}{\sqrt{\mathbf{b}\_{R\_1}' \mathbf{P}\_1 \mathbf{b}\_{R\_1}}} \quad \text{and} \quad \mathbf{E}\_{R\_2} = k\_2 \frac{\mathbf{b}\_{R\_2}' \mathbf{C}^\*}{\sqrt{\mathbf{b}\_{R\_2}' \mathbf{P}^\* \mathbf{b}\_{R\_2}}} \tag{9.15}$$

and

$$\rho\_{R\_1} = \sqrt{\frac{\mathbf{b}\_{R\_1}^{\prime} \mathbf{P}\_1 \mathbf{b}\_{R\_1}}{\mathbf{w}^{\prime} \mathbf{C} \mathbf{w}}} \quad \text{and} \quad \rho\_{R\_2} = \sqrt{\frac{\mathbf{b}\_{R\_2}^{\prime} \mathbf{P}^\* \mathbf{b}\_{R\_2}}{\mathbf{w}^{\prime} \mathbf{C}^\* \mathbf{w}}},\tag{9.16}$$

respectively, whereas the total MRLPSI selection response and expected genetic gain per trait for both stages are equal to RR<sup>1</sup> <sup>þ</sup> RR<sup>2</sup> and <sup>E</sup><sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>E</sup><sup>R</sup><sup>2</sup> .

### 9.2.2 Numerical Examples

To illustrate the MRLPSI theory for a two-stage selection breeding scheme, we use the real data set of the White Leghorn chickens of Hicks et al. (1998). This data set is conformed with six traits (y<sup>1</sup> to y6) that correspond to records consisting of the number of eggs laid during different periods: from week 0 through 4 (y1), 4 through 8 (y2), 8 through 28 (y3), 28 through 32 (y4), 32 through 36 (y5), and 36 through 52 (y6) respectively. The estimated phenotypic and genotypic covariance matrices were

$$
\widehat{\mathbf{P}} = \begin{bmatrix} 102 & 32 & 14 & 4 & 3 & -1 \\ 32 & 80 & 80 & 16 & 17 & 7 \\ 14 & 80 & 298 & 78 & 112 & 62 \\ 4 & 16 & 78 & 66 & 80 & 51 \\ 3 & 17 & 112 & 80 & 135 & 49 \\ -1 & 7 & 62 & 51 & 49 & 98 \end{bmatrix} \quad \text{and} \quad \widehat{\mathbf{C}} = \begin{bmatrix} 44 & 11 & -11 & -3 & -8 & -3 \\ 11 & 26 & 24 & 7 & 7 & 3 \\ -11 & 24 & 62 & 23 & 37 & 20 \\ -3 & 7 & 23 & 14 & 23 & 14 \\ -8 & 7 & 37 & 23 & 42 & 25 \\ -3 & 3 & 20 & 14 & 25 & 18 \end{bmatrix},
$$

respectively, and <sup>w</sup><sup>0</sup> ¼ ½ - 0:08 0:08 0:38 0:08 0:08 0:31 was the vector of economic weights.

Let <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> <sup>y</sup><sup>5</sup> <sup>y</sup><sup>6</sup> ½ and <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> <sup>g</sup><sup>3</sup> <sup>g</sup><sup>4</sup> <sup>g</sup><sup>5</sup> <sup>g</sup><sup>6</sup> ½ be the vectors of observed phenotypic and unobserved genotypic values respectively, and suppose that at stage 1 we select four traits and at stage 2 we select two traits, then x0 <sup>1</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> ½ and x<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>y</sup><sup>5</sup> <sup>y</sup><sup>6</sup> ½ are the vector of observations at stages 1 and 2 respectively, whereas <sup>y</sup><sup>0</sup> <sup>¼</sup> <sup>x</sup><sup>0</sup> <sup>1</sup> x<sup>0</sup> <sup>2</sup> ½ is the vector of total observations at stage 2. We need to estimate vectors b<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> 1K<sup>0</sup> <sup>1</sup> and b<sup>0</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> 2K<sup>0</sup> 2, where b<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 1 P<sup>1</sup> <sup>1</sup> and b<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 P<sup>1</sup> . In Chap. 3, we described methods of estimating matrices <sup>K</sup><sup>1</sup> <sup>¼</sup> [I<sup>1</sup> <sup>Q</sup>1], <sup>Q</sup><sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> Ψ<sup>1</sup> Ψ<sup>0</sup> 1P1 <sup>1</sup>Ψ<sup>1</sup> <sup>1</sup> Ψ0 1, Ψ<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> G0 1, <sup>K</sup><sup>2</sup> <sup>¼</sup> [I<sup>2</sup> <sup>Q</sup>2], <sup>Q</sup><sup>2</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Ψ<sup>2</sup> Ψ<sup>0</sup> 2P<sup>1</sup> Ψ<sup>2</sup> <sup>1</sup> Ψ0 2, and Ψ<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> C, which are used in this subsection. At stage 1, the estimated phenotypic and genotypic covariance matrices were <sup>P</sup>b<sup>1</sup> <sup>¼</sup> 102 32 14 4 32 80 80 16 14 80 298 78 4 16 78 66 2 6 6 4 3 7 7 5 and <sup>G</sup><sup>1</sup> <sup>¼</sup> 44 11 <sup>11</sup> <sup>3</sup> <sup>8</sup> <sup>3</sup> 11 26 24 7 7 3 11 24 62 23 37 20 3 7 23 14 22 14 2 6 6 4 3 7 7 5 respectively. At both stages, traits y<sup>1</sup> and y<sup>2</sup> are restricted. Matrix U can be written as <sup>U</sup><sup>0</sup> ¼ <sup>100000</sup> <sup>010000</sup> , whence the estimated matrix of restrictions was Ψb <sup>0</sup> <sup>1</sup> <sup>¼</sup> UGb<sup>0</sup> <sup>1</sup> <sup>¼</sup> 44 11 <sup>11</sup> <sup>3</sup> 11 26 24 7 ; therefore, the estimated matrices of <sup>Q</sup><sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> Ψ<sup>1</sup> Ψ<sup>0</sup> 1P<sup>1</sup> <sup>1</sup> Ψ<sup>1</sup> <sup>1</sup> Ψ0 <sup>1</sup> and <sup>K</sup><sup>1</sup> <sup>¼</sup> [I<sup>4</sup> <sup>Q</sup>1] were <sup>Q</sup>b<sup>1</sup> <sup>¼</sup> <sup>P</sup>b<sup>1</sup> <sup>1</sup> Ψb <sup>1</sup> Ψb 0 1Pb<sup>1</sup> <sup>1</sup> Ψb <sup>1</sup> <sup>1</sup> Ψb 0 <sup>1</sup> ¼ <sup>0</sup>:<sup>923</sup> 0:<sup>013</sup> 0:<sup>511</sup> 0:<sup>144</sup> 0:164 1:026 1:093 0:317 0:<sup>145</sup> 0:<sup>069</sup> 0:<sup>001</sup> 0:<sup>001</sup> 0:010 0:159 0:178 0:052 2 6 6 4 3 7 7 5 and <sup>K</sup>b<sup>1</sup> <sup>¼</sup> <sup>I</sup><sup>4</sup> <sup>Q</sup>b<sup>1</sup> ¼ 0:077 0:013 0:511 0:144 <sup>0</sup>:<sup>164</sup> 0:<sup>026</sup> 1:<sup>093</sup> 0:<sup>317</sup> 0:145 0:069 1:001 0:001 0:<sup>010</sup> 0:<sup>159</sup> 0:178 0:<sup>948</sup> 2 6 6 4 3 7 7 5 respectively, where

<sup>I</sup><sup>4</sup> is an identity matrix of size 4 4.

The estimated vector b<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> 1K<sup>0</sup> <sup>1</sup> was bb<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> 1Kb<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>½</sup> <sup>0</sup>:<sup>044</sup> 0:<sup>095</sup> <sup>0</sup>:0450:131-, where bb<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Gb0 1Pb<sup>1</sup> <sup>1</sup> ¼ ½ - 0:067 0:125 0:045 0:167 , and <sup>b</sup><sup>I</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>1</sup> x<sup>1</sup> was the estimated MRLPSI at stage 1. The estimated MRLPSI vector of coefficients at stage 2 was <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> 2Kb<sup>0</sup> <sup>2</sup> <sup>¼</sup> ½ <sup>0</sup>:<sup>045</sup> 0:068 0:<sup>028</sup> <sup>0</sup>:057 0:<sup>099</sup> <sup>0</sup>:106 and <sup>b</sup><sup>I</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>2</sup> y was the estimated MRLPSI at stage 2.

The estimated correlation value (bρ<sup>R</sup><sup>12</sup> ) between <sup>b</sup><sup>I</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>2</sup> y was <sup>b</sup>ρ<sup>R</sup><sup>12</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>P</sup>b<sup>1</sup> <sup>P</sup>b<sup>21</sup> bb<sup>R</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pb1bb<sup>R</sup><sup>1</sup> q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>b</sup>b0 <sup>R</sup>2Pbbb<sup>R</sup><sup>2</sup> <sup>q</sup> ¼ <sup>0</sup>:564, where ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pb1bb<sup>R</sup><sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>2Pbbb<sup>R</sup><sup>2</sup> q are

the estimated standard deviations of the variance of <sup>b</sup><sup>I</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>R</sup><sup>2</sup> y respectively. According to Young (1964, Fig. 8), and Eqs. (9.10) and (9.11), the selection intensities for stages 1 and 2 were <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.641 and <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.593 respectively. The estimated selection responses and expected genetic gains per traits

$$\begin{aligned} \text{for} & \quad \text{both} & \quad \text{states} & \quad \widehat{R}\_{R\_1} = k\_1 \sqrt{\widehat{\mathbf{b}}'\_{R\_1} \widehat{\mathbf{P}}\_1 \widehat{\mathbf{b}}\_{R\_1}} = 0.973 & \quad \text{and} \\ \widehat{R}\_{R\_2} = k\_2 \sqrt{\widehat{\mathbf{b}}'\_{R\_2} \widehat{\mathbf{P}}^\* \widehat{\mathbf{b}}\_{R\_2}} &= 0.930, \\ \widehat{\mathbf{E}}'\_{R\_1} = k\_1 \frac{\widehat{\mathbf{G}}'\_1 \widehat{\mathbf{b}}\_{R\_1}}{\sqrt{\widehat{\mathbf{b}}'\_{R\_1} \widehat{\mathbf{P}}\_1 \widehat{\mathbf{b}}\_{R\_1}}} &= \begin{bmatrix} 0 & 0 & 1.271 & 0.870 & 1.482 & 0.974 \end{bmatrix} & \quad \text{and} \quad \widehat{\mathbf{E}}'\_{R\_2} = 0.930 \end{aligned}$$

k2 b C∗0 <sup>b</sup><sup>b</sup>R<sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 R2 b <sup>P</sup><sup>∗</sup>bbR<sup>2</sup> q ¼ ½ -<sup>001</sup>:419 1:014 2:037 1:<sup>349</sup> , whereas <sup>R</sup>b<sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>R</sup>b<sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>1</sup>:<sup>903</sup>

and b E0 <sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>b</sup> E0 <sup>R</sup><sup>2</sup> ¼ ½ - 002:691 1:884 3:519 2:322 were the total estimated MRLPSI selection response and expected genetic gain per trait respectively.

Finally, the estimated MRLPSI accuracy at stage 1 was <sup>b</sup>ρ<sup>R</sup><sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>R</sup>1Pb1bb<sup>R</sup><sup>1</sup> w0 Cwb s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s

<sup>0</sup>:320 and at stage 2 it was <sup>b</sup>ρ<sup>R</sup><sup>2</sup> <sup>¼</sup> bb0 <sup>R</sup>2Pb<sup>∗</sup>bb<sup>R</sup><sup>2</sup> w0 Cb<sup>∗</sup>w <sup>¼</sup> <sup>0</sup>:334. In this case, <sup>b</sup>ρ<sup>R</sup><sup>2</sup> <sup>&</sup>gt; <sup>b</sup>ρ<sup>R</sup><sup>1</sup> . We can explain these results considering that although <sup>b</sup>ρ<sup>R</sup><sup>2</sup> was obtained with six traits, <sup>b</sup>ρ<sup>R</sup><sup>1</sup> was obtained only with four traits, two of them restricted.

#### 9.3 The Multistage Predetermined Proportional Gain Linear Phenotypic Selection Index

The main objectives of the multistage predetermined proportional gain linear phenotypic selection index (MPPG-LPSI) are the same as those of the predetermined proportional gain linear phenotypic selection index (PPG-LPSI) described in Chap. 3, i.e., to optimize, under some predetermined restrictions, the expected genetic gains per trait, to predict the net genetic merit, and to select the individual with the highest net genetic merit values as parents of the next generation under some predetermined restrictions. The MPPG-LPSI allows restrictions different from zero to be imposed on the expected genetic gains of some traits, whereas other traits increase (or decrease) their expected genetic gains without any restrictions being imposed.

### 9.3.1 The MPPG-LPSI Parameters

In a similar manner to the MRLPSI, the MPPG-LPSI vector of coefficients for stages 1 and 2 should be a linear transformation of the MLPSI vector of coefficients at stages 1 (b<sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> <sup>G</sup>1w) and 2 (b<sup>2</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> Cw), and should be written as

$$\mathbf{b}\_{M\_{\parallel}} = \mathbf{K}\_{M\_{\parallel}} \mathbf{b}\_{\parallel} \tag{9.17}$$

and

$$\mathbf{b}\_{M\_2} = \mathbf{K}\_{M\_2}\mathbf{b}\_2,\tag{9.18}$$

respectively, where, at stage 1, <sup>K</sup>M<sup>1</sup> <sup>¼</sup> <sup>I</sup><sup>1</sup> QM<sup>1</sup> , <sup>Q</sup>M<sup>1</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> <sup>1</sup> M<sup>1</sup> M<sup>0</sup> 1P<sup>1</sup> <sup>1</sup> M<sup>1</sup> <sup>1</sup> M0 1, M0 <sup>1</sup> <sup>¼</sup> <sup>D</sup><sup>0</sup> Ψ0 1, Ψ<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>0</sup> G0 1, I<sup>1</sup> is an identity matrix of the same size as P1, and P<sup>1</sup> <sup>1</sup> is the inverse of matrix <sup>P</sup>1. At stage 2, <sup>K</sup><sup>M</sup> <sup>¼</sup> [<sup>I</sup> <sup>Q</sup>M], <sup>Q</sup><sup>M</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> M(M0 P<sup>1</sup> M) 1 M0 , <sup>M</sup><sup>0</sup> ¼ <sup>D</sup><sup>0</sup> Ψ0 , <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> C, I is an identity matrix of the same size as P, P<sup>1</sup> is the inverse dr <sup>0</sup> <sup>0</sup> d<sup>1</sup> 2 3

of matrix <sup>P</sup>, and <sup>D</sup><sup>0</sup> <sup>¼</sup> <sup>0</sup> dr <sup>0</sup> d<sup>2</sup> ⋮⋮⋱⋮ ⋮ 0 0 dr dr<sup>1</sup> 6 6 4 7 7 5 , where dq (<sup>q</sup> <sup>¼</sup> 1, 2..., <sup>r</sup>) is the <sup>q</sup>th

element of <sup>d</sup><sup>0</sup> <sup>¼</sup> <sup>d</sup><sup>1</sup> <sup>d</sup><sup>2</sup> dr ½ -, the vector PPG (predetermined proportional gains) imposed by the breeder (see Chap. 3 for details).

By Eqs. (9.17) and (9.18), the MPPG-LPSI for stages 1 and 2 can be written as IM<sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>M</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and IM<sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>M</sup><sup>2</sup> <sup>y</sup> respectively, where, assuming that at stage 1 we select four traits and at stage 2 we select two traits, x<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>y</sup><sup>1</sup> <sup>y</sup><sup>2</sup> <sup>y</sup><sup>3</sup> <sup>y</sup><sup>4</sup> ½ and x<sup>0</sup> <sup>2</sup> ¼ <sup>y</sup><sup>5</sup> <sup>y</sup><sup>6</sup> ½ are the vectors of phenotypic observations at stages 1 and 2 respectively, and <sup>y</sup><sup>0</sup> ¼ <sup>x</sup><sup>0</sup> <sup>1</sup> x<sup>0</sup> <sup>2</sup> ½ is the vector of total phenotypic observations at stage 2.

Let k<sup>1</sup> and k<sup>2</sup> be the selection intensities for stages 1 and 2 (Eqs. 9.10 and 9.11) respectively and let P<sup>∗</sup> and C<sup>∗</sup> be the adjusted matrices according to Eqs. (9.5) and (9.6) in the MPPG-LPSI context. Then, the MPPG-LPSI selection response and expected genetic gain per trait for both stages can be written as

$$R\_{M\_1} = k\_1 \sqrt{\mathbf{b}\_{M\_1}' \mathbf{P}\_1 \mathbf{b}\_{M\_1}} \quad \text{and} \quad R\_{M\_2} = k\_2 \sqrt{\mathbf{b}\_{M\_2}' \mathbf{P}^\* \mathbf{b}\_{M\_2}} \tag{9.19}$$

and

$$\mathbf{E}\_{M\_1} = k\_1 \frac{\mathbf{G}\_1^{\prime} \mathbf{b}\_{M\_1}}{\sqrt{\mathbf{b}\_{M\_1}^{\prime} \mathbf{P}\_1 \mathbf{b}\_{M\_1}}} \quad \text{and} \quad \mathbf{E}\_{M\_2} = k\_2 \frac{\mathbf{b}\_{M\_2}^{\prime} \mathbf{C}^\*}{\sqrt{\mathbf{b}\_{M\_2}^{\prime} \mathbf{P}^\* \mathbf{b}\_{M\_2}}}, \tag{9.20}$$

respectively, whereas the total MPPG-LPSI selection response and expected genetic gain per trait for both stages are equal to RM<sup>1</sup> <sup>þ</sup> RM<sup>2</sup> and <sup>E</sup><sup>M</sup><sup>1</sup> <sup>þ</sup> <sup>E</sup><sup>M</sup><sup>2</sup> . In addition, the MPPG-LPSI accuracy for both stages can be written as

$$\rho\_{M\_{\parallel}} = \sqrt{\frac{\mathbf{b}\_{M\_{\parallel}}^{\prime}\mathbf{P}\_{\mathbf{l}}\mathbf{b}\_{M\_{\parallel}}}{\mathbf{w}^{\prime}\mathbf{C}\mathbf{w}}} \quad \text{and} \quad \rho\_{M\_{\odot}} = \sqrt{\frac{\mathbf{b}\_{M\_{\odot}}^{\prime}\mathbf{P}^{\*}\mathbf{b}\_{M\_{\odot}}}{\mathbf{w}^{\prime}\mathbf{C}^{\*}\mathbf{w}}}. \tag{9.21}$$

### 9.3.2 Numerical Examples

We use the real data set described in Sect. 9.2.2 to illustrate the theoretical results of the MPPG-LPSI in the same form as we did with those of the MRLPSI. We need to estimate vectors b<sup>0</sup> <sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> 1K<sup>0</sup> <sup>M</sup><sup>1</sup> and b<sup>0</sup> <sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup><sup>0</sup> 2K<sup>0</sup> M<sup>2</sup> , where b<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 1P<sup>1</sup> <sup>1</sup> and b0 <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> G0 P<sup>1</sup> . In Chap. <sup>3</sup> we have given methods to estimates <sup>K</sup><sup>M</sup> <sup>¼</sup> [<sup>I</sup> <sup>Q</sup>M], <sup>Q</sup><sup>M</sup> <sup>¼</sup> <sup>P</sup><sup>1</sup> M(M0 P<sup>1</sup> M) 1 M0 , <sup>M</sup><sup>0</sup> ¼ <sup>D</sup><sup>0</sup> Ψ0 , and <sup>Ψ</sup><sup>0</sup> ¼ <sup>U</sup><sup>0</sup> C, which will be used in this subsection.

The estimated phenotypic and genotypic covariance matrices at stage 1 were <sup>P</sup>b<sup>1</sup> <sup>¼</sup> 102 32 14 4 32 80 80 16 14 80 298 78 4 16 78 66 2 6 6 4 3 7 7 5 and <sup>G</sup><sup>1</sup> <sup>¼</sup> 44 11 <sup>11</sup> <sup>3</sup> <sup>8</sup> <sup>3</sup> 11 26 24 7 7 3 11 24 62 23 37 20 3 7 23 14 22 14 2 6 6 4 3 7 7 5 respectively, whereas <sup>w</sup><sup>0</sup> ¼ ½ - 0:08 0:08 0:38 0:08 0:08 0:31 was the vector of economic weights. The traits restricted at both stages are y1, y2, and y3. The vector of PPG was <sup>d</sup><sup>0</sup> ¼ ½ - <sup>235</sup> , whence <sup>D</sup><sup>0</sup> ¼ 5 0 <sup>2</sup> 0 5 <sup>3</sup> and <sup>U</sup><sup>0</sup> ¼ 100000 010000 001000 2 4 3 5 were matrices D<sup>0</sup> and U. The estimated matrices of M<sup>0</sup> <sup>1</sup> and <sup>K</sup><sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup> <sup>Q</sup><sup>M</sup><sup>1</sup> were <sup>M</sup><sup>b</sup> <sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>D</sup><sup>0</sup> Ψ0 <sup>1</sup> <sup>¼</sup> 242 7 <sup>178</sup> <sup>61</sup> 88 58 <sup>66</sup> <sup>34</sup> and

$$
\widehat{\mathbf{K}}\_{M\_1} = \begin{bmatrix}
0.176 & 0.205 & 0.606 & 0.159 \\
0.031 & 0.032 & -0.007 & 0.199 \\
0.195 & 0.235 & 0.852 & -0.098 \\
0.130 & 0.130 & -0.098 & 0.940
\end{bmatrix} \text{respectively, where } \widehat{\mathbf{W}}\_1' = \mathbf{U}' \widehat{\mathbf{G}}\_1'.
$$

$$
\text{At extreme I and 2 the estimated MDDC LDSI, water of coefficients, } \widehat{\mathbf{O}}.
$$

At stages 1 and 2, the estimated MPPG-LPSI vector of coefficients were <sup>b</sup>b<sup>0</sup> M<sup>1</sup> ¼ <sup>b</sup>b<sup>0</sup> 1Kb0 <sup>M</sup><sup>1</sup> ¼ ½ - 0:068 0:035 0:039 0:160 and bb<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Gb0 1Pb<sup>1</sup> <sup>1</sup> ¼ ½ 0:067 0:125 0:<sup>045</sup> <sup>0</sup>:167-, whence the estimated MPPG-LGSI were <sup>b</sup><sup>I</sup> <sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>2</sup> <sup>y</sup>. The estimated correlation value (bρM12) between <sup>b</sup><sup>I</sup> <sup>M</sup><sup>1</sup> ¼ <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>2</sup> <sup>y</sup> was <sup>b</sup>ρ<sup>M</sup><sup>12</sup> <sup>¼</sup> bb0 <sup>M</sup><sup>1</sup> <sup>P</sup>b<sup>1</sup> <sup>P</sup>b<sup>21</sup> bb<sup>M</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>1Pb1bb<sup>M</sup><sup>1</sup> q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>b</sup>b0 <sup>M</sup>2Pbbb<sup>M</sup><sup>2</sup> <sup>q</sup> ¼ <sup>0</sup>:870, where

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>1Pb1bb<sup>M</sup><sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>2Pbbb<sup>M</sup><sup>2</sup> q were the estimated standard deviations of variance of <sup>b</sup><sup>I</sup> <sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>1</sup> <sup>x</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>b</sup>b<sup>0</sup> <sup>M</sup><sup>2</sup> y respectively. According to Young (1964, Fig. 8), the selection intensities for stages 1 and 2 were <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744 and <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.721 (Eqs. 9.10 and 9.11) respectively.

The estimated selection responses and expected genetic gains per traits for both stages were <sup>R</sup>b<sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>1Pb1bb<sup>M</sup><sup>1</sup> q <sup>¼</sup> <sup>1</sup>:553 and <sup>R</sup>b<sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>2Pb<sup>∗</sup>bb<sup>M</sup><sup>2</sup> q ¼ <sup>1</sup>:401, b E0 <sup>M</sup><sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> Gb0 <sup>1</sup>bb<sup>M</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>1Pb1bb<sup>M</sup><sup>1</sup> q ¼ ½ -0:877 1:316 2:193 1:128 1:655 1:037 , and

$$
\hat{\mathbf{E}}'\_{M\_2} = k\_2 \frac{\hat{\mathbf{C}}^{\*'} \hat{\mathbf{b}}\_{M\_2}}{\sqrt{\hat{\mathbf{b}}\_{M\_2}^{'} \hat{\mathbf{P}}^{\*} \hat{\mathbf{b}}\_{M\_2}}} = \begin{bmatrix} 0.878 & 1.346 & 2.604 & 1.433 & 2.506 & 1.602 \end{bmatrix}, \text{ whereas}
$$

<sup>R</sup>bM<sup>1</sup> <sup>þ</sup> <sup>R</sup>bM<sup>2</sup> <sup>¼</sup> <sup>2</sup>:954 and <sup>b</sup> E0 <sup>M</sup><sup>1</sup> <sup>þ</sup> <sup>b</sup> E0 <sup>M</sup><sup>2</sup> <sup>¼</sup> <sup>½</sup> <sup>1</sup>:755 2:662 4:<sup>797</sup> <sup>2</sup>:561 4:161 2:639- were the total estimated MPPGLPSI selection response and expected genetic gain per trait respectively. Note that the vector of predetermined restriction was <sup>d</sup><sup>0</sup> ¼ ½ - 235 . This means that the MPPG-LPSI efficiency at predicting the total expected genetic gain per trait was high because the difference between each predetermined value (2, 3, and 5) and the total of each predicted value (1.755, 2.662, and 4.797) were 0.245, 0.338, and 0.203 respectively.

Finally, the estimated MPPG-LPSI accuracy at stage 1 was <sup>b</sup>ρ<sup>M</sup><sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>1Pb1bb<sup>M</sup><sup>1</sup> w0 Cwb s <sup>¼</sup> <sup>0</sup>:435, and at stage 2 it was <sup>b</sup>ρ<sup>M</sup><sup>2</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bb0 <sup>M</sup>2Pb<sup>∗</sup>bb<sup>M</sup><sup>2</sup> w0 Cb<sup>∗</sup>w s ¼ <sup>0</sup>:428; that is, both were very

similar.

#### 9.4 The Multistage Linear Genomic Selection Index

We describe the multistage linear genomic selection indices (MLGSI) as an extension of the linear genomic selection index (LGSI, Chap. 5) theory to the multistage genomic selection context; thus, the theoretical results of the MLGSI are very similar to those of the LGSI. The MLGSI is a linear combination of genomic estimated breeding values (GEBVs) and is useful for predicting individual net genetic merit and for selecting individuals from a nonphenotyped testing population as parents of the next selection cycle.

### 9.4.1 The MLGSI Parameters

The objective of the MLGSI is to predict the net genetic merit <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g, where g is a vector of true breeding values and w<sup>0</sup> is the vector of economic weights, using only GEBVs. In Chap. 5, we indicated that the covariance between γ<sup>i</sup> and g<sup>i</sup> is equal to the variance of <sup>γ</sup>i, i.e.,Cov <sup>g</sup>i; <sup>γ</sup><sup>i</sup> ð Þ¼ <sup>s</sup> 2 <sup>i</sup> , and that the GEBV associated with the ith trait is a predictor of the ith vector of genomic breeding values (γi). In the testing population, the only observable information is w<sup>0</sup> and the GEBV associated with the traits of interest. For this reason, in practice, we construct a linear combination of GEBVs, which should be a good predictor of <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> g.

Suppose that the breeder is interested in four traits, and that <sup>γ</sup><sup>0</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> <sup>γ</sup><sup>3</sup> <sup>γ</sup><sup>4</sup> ½ -, <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> <sup>g</sup><sup>3</sup> <sup>g</sup><sup>4</sup> ½ -, and <sup>w</sup><sup>0</sup> ¼ ½ w<sup>1</sup> w<sup>2</sup> w<sup>3</sup> w<sup>4</sup> are the vectors of genomic breeding values (γ), true breeding values (g), and economic weights (w) respectively. Let <sup>Γ</sup> ¼ Varð Þ¼ <sup>γ</sup> s2 <sup>1</sup> s<sup>12</sup> s<sup>13</sup> s<sup>14</sup> s<sup>21</sup> s<sup>2</sup> <sup>2</sup> s<sup>23</sup> s<sup>24</sup> s<sup>31</sup> s<sup>32</sup> s<sup>2</sup> <sup>3</sup> s<sup>34</sup> s<sup>41</sup> s<sup>42</sup> s<sup>43</sup> s<sup>2</sup> 4 2 6 6 4 3 7 7 5 and

$$\mathbf{C} = (\mathbf{g}) = \begin{bmatrix} \sigma\_1^2 & \sigma\_{12} & \sigma\_{13} & \sigma\_{14} \\ \sigma\_{21} & \sigma\_2^2 & \sigma\_{23} & \sigma\_{24} \\ \sigma\_{31} & \sigma\_{32} & \sigma\_3^2 & \sigma\_{34} \\ \sigma\_{41} & \sigma\_{42} & \sigma\_{43} & \sigma\_4^2 \\ \vdots & \vdots & \vdots & \vdots & \vdots \end{bmatrix} \quad \text{be the covariance matrix of } \mathbf{g} \text{ and } \mathbf{\eta}. \text{ At a}$$

two-stage selection breeding scheme, <sup>γ</sup><sup>0</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> <sup>γ</sup><sup>3</sup> <sup>γ</sup><sup>4</sup> ½ can be partitioned into γ<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> ½ and γ<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>γ</sup><sup>3</sup> <sup>γ</sup><sup>4</sup> ½ - ; therefore, at stage 1, <sup>Γ</sup><sup>1</sup> <sup>¼</sup> Var <sup>γ</sup><sup>1</sup> ð Þ¼ s2 <sup>1</sup> s<sup>12</sup> s<sup>21</sup> s<sup>2</sup> 2 is the genomic covariance matrix of <sup>γ</sup><sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> ½ and Cov <sup>γ</sup><sup>1</sup> ð Þ¼ ; <sup>g</sup> s2 <sup>1</sup> s<sup>12</sup> s<sup>13</sup> s<sup>14</sup> s<sup>12</sup> s<sup>2</sup> <sup>2</sup> <sup>s</sup><sup>23</sup> <sup>s</sup><sup>24</sup> <sup>¼</sup> <sup>A</sup><sup>1</sup> is the covariance matrix of <sup>γ</sup><sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> ½ with <sup>g</sup><sup>0</sup> <sup>¼</sup> <sup>g</sup><sup>1</sup> <sup>g</sup><sup>2</sup> <sup>g</sup><sup>3</sup> <sup>g</sup><sup>4</sup> ½ -. Matrix A<sup>1</sup> indicates that we are assuming that the covariance between <sup>γ</sup><sup>i</sup> and <sup>g</sup><sup>j</sup> (i, <sup>j</sup> <sup>¼</sup> 1, 2, , <sup>g</sup>; <sup>g</sup><sup>¼</sup> number of genotypes) is equal to the covariance between γ<sup>i</sup> and γj. This is because, in practice, in the testing population, we can only estimate matrix Γ.

At stage 2, <sup>Γ</sup> ¼ Var(γ) is the covariance matrix of <sup>γ</sup> and <sup>A</sup> ¼ <sup>Γ</sup> is the covariance matrix of the vector of genomic breeding values γ with the vector of breeding values g. The MLGSI vector of coefficients at stages 1 and 2 are β<sup>0</sup> <sup>1</sup> ¼ w0 A0 1Γ<sup>1</sup> <sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>11</sup> <sup>β</sup><sup>12</sup> ½ and β<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>A</sup>Γ<sup>1</sup> ¼ <sup>w</sup><sup>0</sup> ¼ ½ w<sup>1</sup> w<sup>2</sup> w<sup>3</sup> w<sup>4</sup> respectively, and the MLGSI for both stages can be written as <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup>11γ<sup>1</sup> <sup>þ</sup> <sup>β</sup>12γ<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> 1 <sup>γ</sup><sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup>1γ<sup>1</sup> <sup>+</sup> <sup>w</sup>2γ<sup>2</sup> <sup>+</sup> <sup>w</sup>3γ<sup>3</sup> <sup>+</sup> <sup>w</sup>4γ<sup>4</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> γ.

Let k<sup>1</sup> and k<sup>2</sup> be the MLGSI selection intensities for stages 1 and 2. For both stages, the MLGSI accuracies (ρHI<sup>1</sup> andρHI<sup>2</sup> ), expected genetic gains per trait (E<sup>1</sup> and E2) and selection responses (R<sup>1</sup> and R2) can be written as

$$
\rho\_{Hl\_1} = \sqrt{\frac{\mathfrak{P}\_1' \Gamma\_1 \mathfrak{P}\_1}{\mathbf{w}' \mathbf{C} \mathbf{w}}} \quad \text{and} \quad \rho\_{Hl\_2} = \sqrt{\frac{\mathbf{w}' \mathbf{T}^\* \mathbf{w}}{\mathbf{w}' \mathbf{C}^\* \mathbf{w}}}, \tag{9.22}
$$

$$\mathbf{E}\_1 = k\_1 \frac{\mathbf{A}\_1^\prime \mathbf{f}\_1}{\sqrt{\mathfrak{P}\_1^\prime \Gamma\_1 \mathfrak{P}\_1}} \quad \text{and} \quad \mathbf{E}\_2 = k\_2 \frac{\Gamma^\ast \mathbf{w}}{\sqrt{\mathbf{w}^\prime \Gamma^\ast \mathbf{w}}} \tag{9.23}$$

and

$$R\_1 = k\_1 \sqrt{\mathfrak{P}\_1' \Gamma\_1 \mathfrak{P}\_1} \quad \text{and} \quad R\_2 = k\_2 \sqrt{\mathbf{w}' \Gamma^\* \mathbf{w}}.\tag{9.24}$$

The total MLGSI expected genetic gain per trait and selection response at both stages are equal to E<sup>1</sup> + E<sup>2</sup> and R<sup>1</sup> + R2. To simplify notation, in Eqs. (9.23) and (9.24), we have omitted the intervals between stages or selection cycles (LG). Matrices C<sup>∗</sup> and Γ<sup>∗</sup> in Eqs. (9.22) to (9.23) are matrices Γ and C adjusted for previous selection on I1.

We adjust matrices Γ and C for previous selection on I<sup>1</sup> as

$$\Gamma^\* = \Gamma - \mu \frac{\mathbf{A}\_1^{'} \mathfrak{B}\_1 \mathfrak{B}\_1^{'} \mathbf{A}\_1}{\mathfrak{B}\_1^{'} \Gamma\_1 \mathfrak{B}\_1} \tag{9.25}$$

and

$$\mathbf{C}^\* = \mathbf{C} - \mu \frac{\mathbf{G}\_1' \mathbf{b}\_1 \mathbf{b}\_1' \mathbf{G}\_1}{\mathbf{b}\_1' \mathbf{P}\_1 \mathbf{b}\_1},\tag{9.26}$$

respectively, where <sup>u</sup> <sup>¼</sup> <sup>k</sup>1(k<sup>1</sup> <sup>τ</sup>), <sup>k</sup><sup>1</sup> is the standardized selection differential, and <sup>τ</sup> is the truncation point when <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> <sup>1</sup>γ<sup>1</sup> is applied. All the terms in Eq. (9.26) were defined in Eq. (9.6).

The correlation between <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> <sup>1</sup>γ<sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> γ can be written as

$$Corr(I\_1, I\_2) = \frac{\mathfrak{F}\_1^{\prime} \mathbf{A}\_1 \mathbf{w}}{\sqrt{\mathfrak{F}\_1^{\prime} \mathbf{F}\_1 \mathfrak{F}\_1} \sqrt{\mathbf{w}^{\prime} \mathbf{T} \mathbf{w}}} = \rho\_{I\_1 I\_2},\tag{9.27}$$

where ffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>1</sup>Γ1β<sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffi w0 <sup>Γ</sup><sup>w</sup> <sup>p</sup> are the standard deviations of the variances of <sup>I</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> 1 <sup>γ</sup><sup>1</sup> and <sup>I</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> γ respectively. In Eq. (9.27), matrix Γ was not adjusted according to Eq. (9.25).

### 9.4.2 Estimating the Genomic Covariance Matrix

All the MLGSI parameters are associated with matrix Γ; thus, the estimation of this matrix in the testing population is very important. We estimate matrix Γ according to the estimation method described in Chap. 5 (Eq. 5.25), that is, as

$$
\widehat{\Gamma}\_l = \left\{ \widehat{\sigma}\_{\mathbb{Y}\_{qq'}} \right\}, \tag{9.28}
$$

where <sup>σ</sup>b<sup>γ</sup>qq0 <sup>¼</sup> <sup>1</sup> g <sup>b</sup>γql <sup>1</sup>μb<sup>γ</sup>ql 0 G<sup>1</sup> l γ bq0 <sup>l</sup> <sup>1</sup>μb<sup>γ</sup>q0<sup>l</sup> is the estimated covariance between γ <sup>b</sup>ql <sup>¼</sup> <sup>X</sup>lub<sup>q</sup> and <sup>γ</sup> bq0 <sup>l</sup> <sup>¼</sup> <sup>X</sup>lub<sup>q</sup><sup>0</sup> at stage <sup>l</sup> or selection cycle of the testing population; <sup>g</sup> is the number of genotypes; <sup>μ</sup>b<sup>γ</sup>ql and <sup>μ</sup>b<sup>γ</sup>q0<sup>l</sup> are the estimated arithmetic means of the values of <sup>b</sup>γql and <sup>γ</sup> bq0 <sup>l</sup>; <sup>1</sup> is an <sup>g</sup> 1 vector of 1s and <sup>G</sup><sup>l</sup> <sup>¼</sup> <sup>c</sup><sup>1</sup> XlX0 <sup>l</sup> is the additive genomic relationship matrix at stage l or selection cycle in the testing population (see Chap. 5 for details).

### 9.4.3 Numerical Examples

We illustrate the MLGSI theoretical results using the data described in Chap. 2, Sect. 2.8.1 simulated for eight phenotypic and seven genomic selection cycles, each with four traits (T1, T2, T<sup>3</sup> and T4), 500 genotypes, four replicates for each genotype, 2500 molecular markers, and 315 quantitative trait loci in one environment. The economic weights of <sup>T</sup>1, <sup>T</sup>2, <sup>T</sup>3, and <sup>T</sup><sup>4</sup> were 1, 1, 1, and 1 respectively. In this subsection, and only for illustrative purposes, we use the data set from cycle 1.

The genotypic and genomic estimated covariance matrices in cycle 1 were

$$
\hat{\mathbf{C}} = \begin{bmatrix} 36.21 & -12.93 & 8.35 & 2.74 \\ -12.93 & 13.04 & -3.4 & -2.24 \\ 8.35 & -3.4 & 9.96 & 0.16 \\ 2.74 & -2.24 & 0.16 & 6.64 \end{bmatrix} \text{ and } \hat{\mathbf{F}} = \begin{bmatrix} 16.26 & -6.51 & 5.60 & 2.29 \\ -6.51 & 5.79 & -2.23 & -1.62 \\ 5.60 & -2.23 & 3.75 & 0.94 \\ 2.29 & -1.62 & 0.94 & 2.62 \end{bmatrix}
$$
 
$$\text{respectively, whereas } \mathbf{w}' = \begin{bmatrix} 1 \ -1 \ 1 \ 1 \end{bmatrix} \text{ was the vector of economic weights.}$$

respectively, whereas <sup>w</sup><sup>0</sup> ¼ ½ - <sup>1</sup> <sup>111</sup> was the vector of economic weights. Matrices Pb and Cb were obtained according to Eqs. (2.22) to (2.24), whereas matrix Γb was obtained according to Eq. (9.28).

Suppose that we select two traits at stages 1 and 2. Then, at stage 1, <sup>Γ</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:<sup>51</sup> 6:51 5:<sup>79</sup> and <sup>A</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:51 5:60 2:<sup>29</sup> 6:51 5:<sup>79</sup> 2:<sup>33</sup> 1:<sup>62</sup> are the estimated covariance matrices of Γ<sup>1</sup> and A<sup>1</sup> respectively, and the estimated MLGSI vector of coefficients was bβ<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> A c0 1Γb<sup>1</sup> <sup>1</sup> ¼ ½ - <sup>1</sup>:<sup>39</sup> 1:<sup>25</sup> . Because at stage 2 β<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>A</sup>Γ<sup>1</sup> ¼ <sup>w</sup><sup>0</sup> ¼ ½ w<sup>1</sup> w<sup>2</sup> w<sup>3</sup> w<sup>4</sup> , the estimated MLGSI vector of coefficients is the vector of economic weights. Thus, <sup>b</sup>ρ<sup>I</sup>1I<sup>2</sup> <sup>¼</sup> bβ0 <sup>1</sup>Ab1<sup>w</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>1</sup>Γb1βb<sup>1</sup> q ffiffiffiffiffiffiffiffiffiffiffi w0 Γbw p ¼

<sup>0</sup>:97 was the estimated correlation between <sup>b</sup><sup>I</sup> <sup>1</sup> <sup>¼</sup> <sup>b</sup>β<sup>0</sup> 1γ <sup>b</sup><sup>1</sup> and <sup>b</sup><sup>I</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>b</sup>γ, and assuming that the fixed proportion was 0.2 (20%), <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744 and <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.721 were the approximated selection intensities for stages 1 and 2 respectively. The adjusted matrices <sup>Γ</sup><sup>∗</sup> and <sup>C</sup><sup>∗</sup> for previous selection on <sup>b</sup><sup>I</sup> <sup>1</sup> <sup>¼</sup> <sup>b</sup>β<sup>0</sup> 1γ <sup>b</sup><sup>1</sup> were <sup>Γ</sup>b<sup>∗</sup> ¼ <sup>7</sup>:<sup>96</sup> 2:11 2:71 0:<sup>88</sup> 2:11 3:<sup>46</sup> 0:<sup>80</sup> 0:<sup>87</sup> <sup>2</sup>:<sup>71</sup> 0:80 2:75 0:<sup>45</sup> <sup>0</sup>:<sup>88</sup> 0:87 0:45 2:<sup>38</sup> 2 6 6 4 3 7 7 5 andCb<sup>∗</sup> ¼ <sup>24</sup>:<sup>40</sup> 5:65 5:47 1:<sup>39</sup> 5:65 8:<sup>55</sup> 1:<sup>63</sup> 1:<sup>41</sup> <sup>5</sup>:<sup>47</sup> 1:63 9:<sup>26</sup> 0:<sup>17</sup> <sup>1</sup>:<sup>39</sup> 1:<sup>41</sup> 0:17 6:<sup>49</sup> 2 6 6 4 3 7 7 5.

The estimated MLGSI accuracy, selection response, and expected genetic gain for stage 1 in the testing population were <sup>b</sup>ρHI<sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>1</sup>Γb1βb<sup>1</sup> w0 Cwb s ¼ <sup>0</sup>:71, <sup>R</sup>b<sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>1</sup>Γb1βb<sup>1</sup> q ¼ <sup>5</sup>:90, and <sup>E</sup>b<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> Ab0 <sup>1</sup>βb<sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 <sup>1</sup>Γb1βb<sup>1</sup> q ¼ ½ -<sup>2</sup>:<sup>88</sup> 1:53 1:00 0:<sup>49</sup>

respectively, whereas at stage 2, the estimated MLGSI accuracy, selection response, and expected genetic gain were <sup>b</sup>ρHI<sup>2</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w w0 Cb<sup>∗</sup>w s <sup>¼</sup> <sup>0</sup>:64, <sup>R</sup>b<sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w p ¼ <sup>4</sup>:10,

and b E0 <sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> <sup>Γ</sup>b<sup>∗</sup><sup>w</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w p ¼ ½ - <sup>1</sup>:<sup>74</sup> 0:92 0:85 0:<sup>58</sup> respectively. The estimated MLGSI accuracy, selection response, and expected genetic gain at stage 2 were lower than at stage 1. This means that the adjusted matrices <sup>Γ</sup>b<sup>∗</sup> and <sup>C</sup>b<sup>∗</sup> negatively affected the estimated MLPSI parameters at stage 2. The total estimated MLGSI selection response and expected genetic gain for stages 1 and 2 were <sup>R</sup>b<sup>1</sup> <sup>þ</sup> <sup>R</sup>b<sup>2</sup> <sup>¼</sup> <sup>9</sup>:<sup>99</sup> and Eb<sup>0</sup> <sup>1</sup> <sup>þ</sup> <sup>E</sup>b<sup>0</sup> <sup>2</sup> ¼ ½ -<sup>4</sup>:<sup>62</sup> 2:45 1:85 1:<sup>07</sup> .

#### 9.5 The Multistage Restricted Linear Genomic Selection Index (MRLGSI)

The restricted linear genomic selection index (RLGSI) described in Chap. 3 is extended to the multistage restricted linear genomic selection index (MRLGSI) context in a two-stage breeding selection scheme.

### 9.5.1 The MRLGSI Parameters

In Sect. 9.4.1, we indicated that the MLGSI vector of coefficients at stage 1 can be written as β<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> A0 1Γ<sup>1</sup> <sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>11</sup> <sup>β</sup><sup>12</sup> ½ and at stage 2 as β<sup>0</sup> <sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>A</sup>Γ<sup>1</sup> ¼ <sup>w</sup><sup>0</sup> ¼ ½ w<sup>1</sup> w<sup>2</sup> w<sup>3</sup> w<sup>4</sup> . It can be shown that the MRLGSI vector of coefficients is a linear transformation of vectors β<sup>1</sup> and β<sup>2</sup> made by matrix KG, which is a projector (see Chaps. 3 and 6 for details) that projects β<sup>1</sup> and β<sup>2</sup> into a space smaller than the original space of β<sup>1</sup> and β2. Thus, at stages 1 and 2, the MRLGSI vector of coefficients is

$$
\mathfrak{g}\_{R\_1} = \mathbf{K}\_{G\_1} \mathfrak{g}\_1 \tag{9.29}
$$

and

$$
\mathfrak{g}\_{R\_2} = \mathbf{K}\_{G\_2} \mathfrak{g}\_2 = \mathbf{K}\_{G\_2} \mathbf{w}, \tag{9.30}
$$

respectively, where <sup>K</sup><sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup>Q<sup>G</sup><sup>1</sup> , <sup>Q</sup><sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>1</sup> <sup>U</sup><sup>0</sup> <sup>1</sup>Γ1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup>Γ1, <sup>K</sup><sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>I</sup>Q<sup>G</sup><sup>2</sup> , and <sup>Q</sup><sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>U</sup><sup>2</sup> <sup>U</sup><sup>0</sup> <sup>2</sup>ΓU<sup>2</sup> <sup>1</sup> U0 <sup>2</sup>Γ are matrix projectors. By Eqs. (9.29) and (9.30), the MRLGSI at stages 1 and 2 can be written as IR<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R1 <sup>γ</sup><sup>1</sup> and IR<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R2 γ respectively, where γ<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> ½ and <sup>γ</sup><sup>0</sup> <sup>¼</sup> <sup>γ</sup><sup>1</sup> <sup>γ</sup><sup>2</sup> <sup>γ</sup><sup>3</sup> <sup>γ</sup><sup>4</sup> ½ are vectors of genomic breeding values, which can be estimated using GEBVs, as described in Chap. 5. In Chap. 6 we described methods for constructing matrix U<sup>0</sup> and estimating matrix KG; those methods are also valid in the MRLGSI context.

In a similar manner to the MLGSI context, MRLGSI accuracies, expected genetic gains per trait, and selection responses for stages 1 and 2 in the testing population can be written as

$$
\rho\_{HI\_1} = \sqrt{\frac{\mathfrak{P}\_{R\_1}' \Gamma\_1 \mathfrak{P}\_{R\_1}}{\mathbf{w}' \mathbf{C} \mathbf{w}}} \quad \text{and} \quad \rho\_{HI\_2} = \sqrt{\frac{\mathfrak{P}\_{R\_2}' \Gamma^\* \mathfrak{P}\_{R\_2}}{\mathbf{w}' \mathbf{C}^\* \mathbf{w}}}, \tag{9.31}
$$

$$\mathbf{E}\_{R\_1} = k\_1 \frac{\mathbf{A}\_1^{\prime} \mathbf{f}\_{R\_1}}{\sqrt{\mathfrak{P}\_{R\_1}^{\prime} \Gamma\_1 \mathfrak{P}\_{R\_1}}} \quad \text{and} \quad \mathbf{E}\_{R\_2} = k\_2 \frac{\Gamma^\* \mathfrak{P}\_{R\_2}}{\sqrt{\mathfrak{P}\_{R\_2}^{\prime} \Gamma^\* \mathfrak{P}\_{R\_2}}} \tag{9.32}$$

and

$$R\_{R\_1} = k\_1 \sqrt{\mathfrak{P}\_{R\_1}' \Gamma\_1 \mathfrak{P}\_{R\_1}} \quad \text{and} \quad R\_{R\_2} = k\_2 \sqrt{\mathfrak{P}\_{R\_2}' \Gamma^\* \mathfrak{P}\_{R\_2}},\tag{9.33}$$

respectively. The total MRLGSI expected genetic gain per trait and selection response for both stages are equal to <sup>E</sup><sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>E</sup><sup>R</sup><sup>2</sup> and RR<sup>1</sup> <sup>þ</sup> RR<sup>2</sup> . To simplify the notation, in Eqs. (9.32) and (9.33), we have omitted the intervals between stages or selection cycles (LG). Matrices Γ<sup>∗</sup> and C<sup>∗</sup> in Eqs. (9.31) to (9.33) are matrices Γ and C adjusted for previous selection.

In the MRLGSI context, matrices Γ<sup>∗</sup> and C<sup>∗</sup> can be obtained as

$$\Gamma^\* = \Gamma - \mu \frac{\mathbf{A}\_1' \mathfrak{B}\_{R\_1} \mathfrak{B}\_{R\_1}' \mathbf{A}\_1}{\mathfrak{B}\_{R\_1}' \Gamma\_1 \mathfrak{B}\_{R\_1}} \tag{9.34}$$

and

$$\mathbf{C}^\* = \mathbf{C} - \mu \frac{\mathbf{G}\_1' \mathbf{b}\_{R\_1} \mathbf{b}\_{R\_1}' \mathbf{G}\_1}{\mathbf{b}\_{R\_1}' \mathbf{P}\_1 \mathbf{b}\_{R\_1}},\tag{9.35}$$

where <sup>β</sup><sup>R</sup><sup>1</sup> was defined in Eq. (9.29) and vector <sup>b</sup><sup>R</sup><sup>1</sup> can be obtained according to the RLPSI as described in Chap. 3. The term <sup>u</sup> ¼ <sup>k</sup>(<sup>k</sup> <sup>τ</sup>) was defined earlier.

The correlation between IR<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R1 <sup>γ</sup><sup>1</sup> and IR<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R2 γ can be written as

$$\rho\_{I\_{R\_1}I\_{R\_2}} = \frac{\mathfrak{B}\_{R\_1}'\mathbf{A}\_1\mathfrak{B}\_{R\_2}}{\sqrt{\mathfrak{B}\_{R\_1}'\mathbf{F}\_1\mathfrak{B}\_{R\_1}}\sqrt{\mathfrak{B}\_{R\_2}'\mathbf{F}\mathfrak{B}\_{R\_2}}},\tag{9.36}$$

where ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 R1 Γ1β<sup>R</sup><sup>1</sup> q and ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 R2 Γβ<sup>R</sup><sup>2</sup> q are the standard deviations of the variances of IR<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R1 <sup>γ</sup><sup>1</sup> and IR<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R2 γ respectively. In Eq. (9.36), matrix Γ was not adjusted for previous selection on IR<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> R1 γ1.

### 9.5.2 Numerical Examples

To illustrate the MRLGSI theory in a two-stage breeding selection scheme, we use the simulated data described in Sect. 9.4.3. In that subsection we indicated that the estimated covariance matrices of <sup>Γ</sup><sup>1</sup> and <sup>A</sup><sup>1</sup> were <sup>Γ</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:<sup>51</sup> 6:51 5:<sup>79</sup> and

<sup>A</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:51 5:60 2:<sup>29</sup> 6:51 5:<sup>79</sup> 2:<sup>33</sup> 1:<sup>62</sup> , and that <sup>b</sup>β<sup>0</sup> <sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Ab0 1Γb<sup>1</sup> <sup>1</sup> <sup>¼</sup> ½ <sup>1</sup>:<sup>39</sup> <sup>1</sup>:25- was the estimated MLGSI vector of coefficients at stage 1. At stage 2, the estimated MLGSI vector of coefficients was <sup>w</sup><sup>0</sup> <sup>¼</sup> ½ - <sup>1</sup> <sup>111</sup> , the vector of economic weights.

Suppose that we restrict only trait 2; then at stages 1 and 2, matrix U<sup>0</sup> <sup>1</sup> ¼ ½ - 0 1 and matrix U<sup>0</sup> <sup>2</sup> ¼ ½ - <sup>0100</sup> respectively. In addition, <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>1</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup>Γb1, <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>U</sup><sup>2</sup> U0 <sup>2</sup>ΓbU<sup>2</sup> <sup>1</sup> U0 <sup>2</sup>Γb, <sup>K</sup><sup>b</sup> <sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup> <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>1</sup> , and <sup>K</sup><sup>b</sup> <sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>I</sup> <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>2</sup> are the estimated matrices described in Eqs. (9.29) and (9.30) for stages 1 and 2. It can be shown that, at stages 1 and 2, βb<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup>b<sup>0</sup> 1Kb<sup>0</sup> <sup>G</sup><sup>1</sup> ¼ ½ - 1:39 1:558 and βb<sup>0</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> Kb0 <sup>G</sup><sup>2</sup> ¼ ½1:0 1:<sup>81</sup> <sup>1</sup>:01:0are the MRLGSI vectors of coefficients respectively.

Suppose that the total proportion retained for the two stages was 20%, then at stage 1, <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744 is an associated approximated selection intensity and the estimated MRLGSI selection response, expected genetic gain per trait, and accuracy were <sup>R</sup>b<sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>R</sup>1Γb1βb<sup>R</sup><sup>1</sup> q <sup>¼</sup> <sup>3</sup>:083, <sup>E</sup>b<sup>R</sup><sup>1</sup> <sup>¼</sup> ½ - 2:225 0 0:742 0:117 , and <sup>b</sup>ρHI<sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>R</sup>1Γb1βb<sup>R</sup><sup>1</sup> w0 Cwb s ¼ <sup>0</sup>:370 respectively. The estimated MRLGSI expected genetic gain, accuracy, and selection response at stage 2 were <sup>E</sup>b<sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> βb0 R2 Γb∗ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>R</sup>2Γb<sup>∗</sup>βb<sup>R</sup><sup>2</sup> q ¼ ½ - <sup>1</sup>:156 0 0:793 0:<sup>536</sup> , <sup>b</sup>ρHI<sup>2</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>R</sup>2Γb<sup>∗</sup>βb<sup>R</sup><sup>2</sup> w0 Cb<sup>∗</sup>w s ¼ <sup>0</sup>:32, and <sup>R</sup>b<sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>R</sup>2Γb<sup>∗</sup>βb<sup>R</sup><sup>2</sup> q <sup>¼</sup> <sup>2</sup>:485 respectively, where <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.721 was the approx-

imated selection intensity value for stage 2. The estimated total MRLGSI selection response and expected genetic gain at stages 1 and 2 were <sup>R</sup>b<sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>R</sup>b<sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>5</sup>:568 and <sup>E</sup><sup>0</sup> <sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>E</sup><sup>0</sup> <sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>½</sup> <sup>3</sup>:380 0 1:<sup>535</sup> <sup>0</sup>:653- respectively. Note that, in effect, the expected genetic gain for trait 2 was 0, as expected.

#### 9.6 The Multistage Predetermined Proportional Gain Linear Genomic Selection Index

The MPPG-LGSI is an adaptation of the predetermined proportional gain linear genomic selection index (PPG-LGSI) described in Chap. 6; thus, the theoretical results, properties, and objectives of both indices are similar. The MPPG-LGSI objective is to change μ<sup>q</sup> to μ<sup>q</sup> + dq, where dq is a predetermined change in μq. We solve this problem by minimizing the mean squared difference between I ¼ <sup>β</sup><sup>0</sup> γ and <sup>H</sup> ¼ <sup>w</sup><sup>0</sup> <sup>g</sup> (E[(<sup>H</sup> I)2 ]) under the restriction U<sup>0</sup> Γβ <sup>¼</sup> <sup>θ</sup>Gd, where <sup>θ</sup><sup>G</sup> is a proportionality constant, <sup>d</sup><sup>0</sup> <sup>¼</sup> [d<sup>1</sup> <sup>d</sup>2...dr] is the vector of predetermined restrictions, <sup>U</sup><sup>0</sup> is a matrix (<sup>t</sup> 1) <sup>t</sup> of 1s and 0s, and <sup>Γ</sup> is a covariance matrix of additive genomic breeding values, <sup>γ</sup><sup>0</sup> <sup>¼</sup> [γ<sup>1</sup> <sup>γ</sup>2...γt], where <sup>r</sup> is the number of predetermined restrictions and t the number of traits.

### 9.6.1 The OMPPG-LGSI Parameters

According to the results in Chap. 6, at stages 1 and 2, the MPPG-LGSI vector of coefficients can be written as

$$\mathfrak{g}\_{P\_1} = \mathfrak{g}\_{\mathcal{R}\_1} + \Theta\_1 \mathbf{U}\_1 \left( \mathbf{U}\_1' \mathbf{F}\_1 \mathbf{U}\_1 \right)^{-1} \mathbf{d} \tag{9.37}$$

and

$$\mathfrak{g}\_{p\_2} = \mathfrak{g}\_{\mathcal{R}\_2} + \theta\_2 \mathbf{U}\_2 \left(\mathbf{U}\_2' \mathbf{F} \mathbf{U}\_2\right)^{-1} \mathbf{d},\tag{9.38}$$

respectively, where <sup>β</sup><sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>K</sup><sup>G</sup>1β1, <sup>β</sup><sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>K</sup><sup>G</sup>2β<sup>2</sup> <sup>¼</sup> <sup>K</sup><sup>G</sup>2w, <sup>K</sup><sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup> <sup>Q</sup><sup>G</sup><sup>1</sup> , <sup>Q</sup><sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>U</sup><sup>1</sup> <sup>U</sup><sup>0</sup> <sup>1</sup>Γ1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup>Γ1, <sup>K</sup><sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>I</sup> <sup>Q</sup><sup>G</sup><sup>2</sup> , and <sup>Q</sup><sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>U</sup><sup>2</sup> <sup>U</sup><sup>0</sup> <sup>2</sup>ΓU<sup>2</sup> <sup>1</sup> U0 2Γ were described in Eqs. (9.29) and (9.30). Also, it can be shown that the proportionality constants for stages 1 (θ1) and 2 (θ2) are

$$\boldsymbol{\Theta}\_{1} = \frac{\mathbf{d}' \left(\mathbf{U}\_{1}' \mathbf{\Gamma}\_{1} \mathbf{U}\_{1}\right)^{-1} \mathbf{U}\_{1}' \mathbf{A}\_{1} \mathbf{w}}{\mathbf{d}' \left(\mathbf{U}\_{1}' \mathbf{\Gamma}\_{1} \mathbf{U}\_{1}\right)^{-1} \mathbf{d}} \quad \text{and} \quad \boldsymbol{\Theta}\_{2} = \frac{\mathbf{d}' \left(\mathbf{U}\_{2}' \mathbf{\Gamma} \mathbf{U}\_{2}\right)^{-1} \mathbf{U}\_{2}' \mathbf{\Gamma} \mathbf{w}}{\mathbf{d}' \left(\mathbf{U}\_{2}' \mathbf{\Gamma} \mathbf{U}\_{2}\right)^{-1} \mathbf{d}},\tag{9.39}$$

respectively. By Eqs. (9.37) to (9.39), the MPPG-LGSI for stages 1 and 2 can be written as IP<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> P1 <sup>γ</sup><sup>1</sup> and IP<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> P2 γ respectively, where γ<sup>1</sup> and γ are vectors of genomic breeding values, which can be estimated using GEBVs (see Chap. 5 for details).

For stages 1 and 2, the MPPG-LGSI accuracies (ρHI<sup>1</sup> and ρHI<sup>2</sup> ), expected genetic gains per trait (E<sup>P</sup><sup>1</sup> and E<sup>P</sup><sup>2</sup> ), and selection responses (RP<sup>1</sup> and RP<sup>2</sup> ) can be written as

$$
\rho\_{Hl\_1} = \sqrt{\frac{\mathfrak{P}\_{P\_1}' \Gamma\_1 \mathfrak{P}\_{P\_1}}{\mathbf{w}' \mathbf{C} \mathbf{w}}} \quad \text{and} \quad \rho\_{Hl\_2} = \sqrt{\frac{\mathfrak{P}\_{P\_2}' \Gamma^\* \mathfrak{P}\_{P\_2}}{\mathbf{w}' \mathbf{C}^\* \mathbf{w}}}, \tag{9.40}
$$

$$\mathbf{E}\_{P\_1} = k\_1 \frac{\mathbf{A}\_1^{\prime} \mathfrak{P}\_{P\_1}}{\sqrt{\mathfrak{P}\_{P\_1}^{\prime} \Gamma\_1 \mathfrak{P}\_{P\_1}}} \quad \text{and} \quad \mathbf{E}\_{P\_2} = k\_2 \frac{\Gamma^\* \mathfrak{P}\_{P\_2}}{\sqrt{\mathfrak{P}\_{P\_2}^{\prime} \Gamma^\* \mathfrak{P}\_{P\_2}}} \tag{9.41}$$

and

228 9 Multistage Linear Selection Indices

$$R\_{P\_1} = k\_1 \sqrt{\mathfrak{P}\_{P\_1}' \Gamma\_1 \mathfrak{P}\_{P\_1}} \quad \text{and} \quad R\_{P\_2} = k\_2 \sqrt{\mathfrak{P}\_{P\_2}' \Gamma^\* \mathfrak{P}\_{P\_2}},\tag{9.42}$$

respectively. The total MPPG-LGSI expected genetic gain per trait and selection response at both stages are equal to <sup>E</sup>P<sup>1</sup> <sup>þ</sup> <sup>E</sup>P<sup>2</sup> and RP<sup>1</sup> <sup>þ</sup> RP<sup>2</sup> . To simplify the notation, in Eqs. (9.41) and (9.42), we omitted the intervals between stages or selection cycles (LG). Matrices Γ<sup>∗</sup> and C<sup>∗</sup> are matrices Γ and C adjusted for previous selection on IP<sup>1</sup> according to Eqs. (9.34) and (9.35) respectively in the MPPG-LGSI context.

The correlation between IP<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> P1 <sup>γ</sup><sup>1</sup> and IP<sup>2</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> P2 γ can be written as

$$\rho\_{12} = \frac{\mathfrak{B}\_{p\_1}' \mathbf{A}\_1 \mathfrak{B}\_{p\_2}}{\sqrt{\mathfrak{B}\_{p\_1}' \mathbf{F}\_1 \mathfrak{B}\_{p\_1}} \sqrt{\mathfrak{B}\_{p\_2}' \mathbf{F} \mathfrak{B}\_{p\_2}}} \,. \tag{9.43}$$

In Eq. (9.43), matrix <sup>Γ</sup> was not adjusted for previous selection on IP<sup>1</sup> <sup>¼</sup> <sup>β</sup><sup>0</sup> P1 γ1.

### 9.6.2 Numerical Examples

To illustrate the MPPG-LGSI theory, we use the simulated data described in Sect. 9.4.3. Suppose that we select two traits at stages 1 and 2; then, at stage 1, <sup>Γ</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:<sup>51</sup> 6:51 5:<sup>79</sup> and <sup>A</sup>b<sup>1</sup> <sup>¼</sup> <sup>16</sup>:<sup>26</sup> 6:51 5:60 2:<sup>29</sup> 6:51 5:<sup>79</sup> 2:<sup>33</sup> 1:<sup>62</sup> are the estimated covariance matrices of Γ<sup>1</sup> and A<sup>1</sup> respectively. We restricted trait 2 with <sup>d</sup> ¼ 2; then, at the stage 1 matrix <sup>U</sup><sup>0</sup> <sup>1</sup> ¼ ½ - 0 1 and at the stage 2 matrix U0 <sup>2</sup> ¼½ - <sup>0100</sup> . In addition, <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>1</sup> <sup>¼</sup>U<sup>1</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup>Γb1, <sup>Q</sup><sup>b</sup> <sup>G</sup><sup>2</sup> <sup>¼</sup>U<sup>2</sup> U0 <sup>2</sup>ΓbU<sup>2</sup> <sup>1</sup> U0 <sup>2</sup>Γb, <sup>K</sup><sup>b</sup> <sup>G</sup><sup>1</sup> <sup>¼</sup> <sup>I</sup>Q<sup>b</sup> <sup>G</sup><sup>1</sup> , and <sup>K</sup><sup>b</sup> <sup>G</sup><sup>2</sup> <sup>¼</sup> <sup>I</sup>Q<sup>b</sup> <sup>G</sup><sup>2</sup> are the estimates of matrix projectors associated with stages 1 and 2 (Eqs. 9.37 and 9.38 for details).

In Sect. 9.4.3, we showed that the estimated MRLGSI vector of coefficients for stage 1 was βb<sup>0</sup> <sup>R</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup>b<sup>0</sup> 1Kb<sup>0</sup> <sup>G</sup><sup>1</sup> ¼ ½ - 1:386 1:550 . Thus, by Eq. (9.37), to obtain <sup>β</sup>b<sup>P</sup><sup>1</sup> <sup>¼</sup> <sup>β</sup>b<sup>R</sup><sup>1</sup> <sup>þ</sup> <sup>b</sup>θ1U<sup>1</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> d, we only need to obtain bθ<sup>1</sup> and U<sup>1</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> d, where <sup>d</sup> ¼ 2 andbθ<sup>1</sup> <sup>¼</sup> <sup>d</sup><sup>0</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup>Ab1w d0 U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> d . It can be shown thatU<sup>1</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> <sup>d</sup> ¼ <sup>0</sup> 0:<sup>345</sup> and <sup>b</sup>θ<sup>1</sup> <sup>¼</sup> <sup>8</sup>:125; therefore, <sup>b</sup>β<sup>0</sup> <sup>P</sup><sup>1</sup> ¼ ½ - <sup>1</sup>:<sup>39</sup> 1:<sup>25</sup> is the MPPG-LGSI vector of coefficients at stage 1.

Suppose that the total proportion retained for the two stages was 20%; then, <sup>k</sup><sup>1</sup> <sup>¼</sup> 0.744 is an approximate selection intensity associated with MPPG-LGSI and the estimated MPPG-LGSI accuracy, selection response, and expected genetic gain at stage 1 were <sup>b</sup>ρHI<sup>1</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>P</sup>1Γb1βbP<sup>1</sup> w0 Cwb s <sup>¼</sup> <sup>0</sup>:71, <sup>R</sup>bP<sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bβ0 <sup>P</sup>1Γb1βbP<sup>1</sup> q ¼ <sup>5</sup>:90 and Eb0 <sup>P</sup><sup>1</sup> <sup>¼</sup> <sup>k</sup><sup>1</sup> Ab0 <sup>1</sup>βbP<sup>1</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi β0 P1 Γb1βbP<sup>1</sup> q ¼ ½ -<sup>2</sup>:<sup>88</sup> 1:53 1:00 0:<sup>49</sup> respectively.

It can be shown that at stage 2, d<sup>0</sup> U0 <sup>1</sup>Γb1U<sup>1</sup> <sup>1</sup> U0 <sup>1</sup> ¼ ½ - <sup>0</sup> 0:345 0 0 , <sup>b</sup>θ<sup>2</sup> <sup>¼</sup> 8:125 and bβ<sup>0</sup> <sup>P</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup><sup>0</sup> <sup>¼</sup> ½ - <sup>1</sup> <sup>111</sup> . Thus, the estimated MPPG-LGSI accuracy, selection response, and expected genetic gain at this stage were <sup>b</sup>ρHI<sup>2</sup> <sup>¼</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w w0 Cb<sup>∗</sup>w s <sup>¼</sup> <sup>0</sup>:64, <sup>R</sup>b<sup>P</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w p ¼ <sup>4</sup>:10, and <sup>b</sup> E0 <sup>P</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> <sup>Γ</sup>b<sup>∗</sup><sup>w</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffi w0 Γb<sup>∗</sup>w p ¼ ½ - <sup>1</sup>:<sup>74</sup> 0:92 0:85 0:<sup>58</sup> respectively, where <sup>k</sup><sup>2</sup> <sup>¼</sup> 0.721. The estimated total MPPG-LGSI selection response and expected genetic gain for both stages were Rb<sup>P</sup><sup>1</sup> <sup>þ</sup>Rb<sup>P</sup><sup>2</sup> <sup>¼</sup> <sup>9</sup>:99 andEb<sup>0</sup> <sup>P</sup><sup>1</sup> <sup>þ</sup> <sup>E</sup>b<sup>0</sup> <sup>P</sup><sup>2</sup> ¼ ½ - <sup>4</sup>:<sup>62</sup> 2:45 1:85 1:<sup>07</sup> respectively. Note that the total expected genetic gain for trait 2 was 2.45, which is similar to <sup>d</sup> ¼ 2, the PPG imposed by the breeder. Finally, to simplify the notation, we omitted the intervals between stages or selection cycles (LG) in the estimated MPPG-LPSI selection response and expected genetic gain for both stages.

#### References

Arismendi JC (2013) Multivariate truncated moments. J Multivar Anal 117:41–75


Thomas GB (2014) Thomas' calculus: early transcendentals, 3r edn. Pearson Education, Inc., Boston, MA

Xu S, Muir WM (1992) Selection index updating. Theor Appl Genet 83:451–458

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Young SSY (1964) Multi-stage selection for genetic gain. Heredity 19:131–143

### Chapter 10 Stochastic Simulation of Four Linear Phenotypic Selection Indices

Fernando H. Toledo, José Crossa, and Juan Burgueño

Abstract Stochastic simulation can contribute to a better understanding of the problem, and has already been successfully applied to evaluate other breeding scenarios. Despite all the theories developed in this book concerning different types of indices, including phenotypic data and/or data on molecular markers, no examples have been presented showing the long-term behavior of different indices. The objective of this chapter is to present some results and insights into the in silico (computer simulation) performance comparison of over 50 selection cycles of a recurrent and generic population breeding program with different selection indices, restricted and unrestricted. The selection indices included in this stochastic simulation were the linear phenotypic selection index (LPSI), the eigen selection index method (ESIM), the restrictive LPSI, and the restrictive ESIM.

#### 10.1 Stochastic Simulation

Simulations were used to evaluate the accuracy, effectiveness, response to selection, and the decrease in the overall genetic variance in a recurrent selection scheme under the use of the Smith (1936) and Hazel (1943) index (or linear phenotypic selection index, LPSI, see Chap. 2 for details); the eigen selection index method (ESIM, see Chap. 7 for details); the Kempthorne and Nordskog (1959) restricted index (K&N or restricted phenotypic selection index, RLPSI, see Chap. 3 for details); and the restricted eigen selection index method (RESIM, see Chap. 3 for details). The different scenarios are described below and encompass variations in the nature of the genetic correlation between traits in addition to their expected heritabilities.

F. H. Toledo (\*) · J. Crossa · J. Burgueño

Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico, Mexico e-mail: f.toledo@cgiar.org

#### 10.1.1 Breeding Design

A total of 50 forward recurrent selection cycles of modern breeding were simulated, in which the breeder has the ability to select based on breeding value estimates of genetically correlated traits, and to apply the various above-mentioned selection indices. All simulated scenarios (described below) followed a common general breeding design. In each cycle, 350 full sib progenies (S1) were generated taking 700 parents at random from the base population. From each progeny, 100 doublehaploid lines were randomly derived (which shortened the cycle interval by five inbreeding generations). The simulated phenotypic values of the 35,000 resulting lines were then evaluated in simulated trials. The selection was made by means of the progeny average performance. The selected progenies (top quarter) according to each index were then recombined by random mating a sample of the lines within the progeny to recover the population for the next cycle.

#### 10.1.2 Simulating Quantitative Traits

Genetically correlated quantitative traits were simulated assuming a full pleiotropic model. This was carried out by randomly sampling genetic effects for all segregating sites from a multivariate normal distribution with zero mean and a previously stated variance–covariance. The genetic effects were in turn used to compute true breeding values (TBVs). An individual's phenotype was obtained by taking its TBV and adding a zero mean normally random term with variance consistent with the expected heritability (h<sup>2</sup> ) for the trait at which phenotyping occurred. The genetic variance in each cycle was calculated as the variance of the TBV of the individuals in that generation. However, it was expressed as relative values of the genetic variance in the initial cycle. The realized response to selection was also standardized in units of the genetic standard deviation in cycle 0. Cycle 0 was used as the base generation because it represents the available genetic variability, and also to observe, from the start, the genetic changes in future breeding generations.

An empirical genome was considered comprising a set of 10 linkage groups (chromosomes), each 200 cM in length, and 1000 uniformly distributed segregating sites. To represent the historical evolution and recent breeding efforts up to the present day in addition to incorporating a steady state of known linkage disequilibrium (LD) structure existing in crops, the starting populations (cycle 0) were taken after 200 generations of random mating within an effective population size of 1000 segregating for all loci in which the allele frequency was 0.5.

The in silico meiosis reflected the Mendelian laws of segregation for diploid species, by a count-location process that mimics the Haldane map function (Haldane 1919). Thus, homologous chromosomes are paired into bivalents and recombined through randomly positioned chiasmata. The number of chiasmata follows a Poisson distribution, where the λ parameter represents the chromosome length in Morgans and their positions are uniformly distributed, i.e., without interference between crossovers or any mutagenesis process.

#### 10.1.3 Simulated Scenarios

Three traits were considered, one with low heritability (the first, <sup>h</sup><sup>2</sup> <sup>¼</sup> 0.2) and two with high heritability (the second and the third, <sup>h</sup><sup>2</sup> <sup>¼</sup> 0.5). The correlations between the first and second trait vary from positive (ρ<sup>G</sup> <sup>¼</sup> 0.5) to negative (ρ<sup>G</sup> ¼ -0.5). The third trait was always considered with segregation independent from the two others.

The selection process involved two unrestricted indices: the LPSI (see Chap. 2), which ranks the progenies based on the average merit of their lines considering equal economic weights for all traits, and the ESIM (see Chap. 7), where the progenies were ranked in terms of ESIM values. Regarding the restricted selection indices, the RLPSI (or K&N) was employed (see Chap. 3) with equal economic weights for the traits in addition to the RESIM (see Chap. 7). Because of the restrictions, two different situations were evaluated in the latter cases, i.e., where the restrictions were applied for each of the first and second traits separately.

Thus, all simulated scenarios encompass a three-way factorial: four selection procedures (the LPSI, the ESIM, the RLPSI or K&N, and the RESIM); two correlation scenarios, positive (ρ<sup>G</sup> ¼ 0.5) and negative correlations (ρ<sup>G</sup> ¼ -0.5) between the first and second trait; and two constraint situations, where the restrictions were applied separately for the first and second traits.

To simulate genetically correlated traits a full pleiotropic model was assumed. Gene effects were sampled from a multivariate normal distribution with zero mean and a previously stated variance–covariance matrix. In that sense it is possible to represent a quantitative and infinitesimal model. Each genes has its own effect varying according to a probabilistic density i.e., genes with positive and negative effects varying its effects sizes; alleles with large effects at lower frequency (major genes) and alleles with modest effects at higher frequency (minor genes).

#### 10.1.4 Inferences

Results are presented as summaries of 100 Monte Carlo replicates for each scenario and include the response to selection, decreases in the genetic variance, selection accuracy, and observed heritabilities. The meiosis routine was implemented in C++, and compiled, linked, and through the facilities provided by the Rccp R package (Eddelbuettel 2013). All simulations were performed, analyzed, and summarized in R version 3.3.3 (R Development Core Team 2017).

#### 10.2 Results

Overriding the results of the simulations regarding the four selection indices under the different trait genetic correlations and restrictions, scenarios are presented in terms of the consistency of the observed heritabilities of the traits; the response to selection and changes in genetic variance for each trait; and the accuracy of the indices' selection.

First of all, the results show the stability of the Monte Carlo replicates in terms of possible deviations in the observed heritability from that expected, which in turn may affect further inferences (Table 10.1). The type I error (α) of the t test comparing expected and observed heritabilities for all simulated scenarios did not show important and significant departures. Slight departures that may be due to Monte Carlo error (P < 0.05) were found, namely: for both high and low heritability traits of the LPSI at cycle 5 when they were negatively correlated; for the independent trait also with the LPSI at cycle 50, but, when the other traits are positively correlated; for the high heritable trait at the first and last cycles, both under positive correlation in the ESIM and RESIM indices respectively; and for the low heritability trait in both restricted indices (RLPSI and RESIM) in cycles 0 and 5 for respective and negative and positive correlations.

A complementary estimate of the power (type II error or β) of the tests was performed considering departures from the expected heritabilities of 1%. It was verified that the average power if the observed estimates was around 70%, which reinforces the appropriateness of the simulation findings.

#### 10.2.1 Realized Genetic Gains

Figure 10.1 shows the average genetic gains (expressed as standard deviations from the mean of cycle 0) for cycles 0–50 for the traits (low and high heritabilities and the independent trait); the four selection indices (unrestricted: LPSI and ESIM and restricted: RLPSI and RESIM) when the correlations are positive and negative.

It is important to note that even after 50 recurrent cycles none of the scenarios has shown any indication that the selection plateau has been reached (Fig. 10.1). It is considered that even with the variation of the gains in the scenarios, there were increases in the merit of the target traits. Thus, the employment of selection indices is an effective way of achieving progress in long-term multi-trait selection.

As expected, the unrestricted selection indices have shown genetic gains higher than their restricted counterparts (Fig. 10.1). It must be highlighted that the restrictions proved their properties because when any trait was restricted, no gains were obtained for that trait (data not shown). The higher gains obtained with unrestricted indices is well known and justified in comparison with their restricted homologous because the net genetic merit is beneficiated by the gains in all traits, while, with gains constrained to zero in some traits, there are no indirect gains that may be highlighted especially because of positive correlations.

The independent trait has presented the higher gains in comparison with the other traits for all correlation and selection process scenarios. The higher gains, however, were for the RESIM followed by the RLPSI in both positive and negative correlations (Fig. 10.1e and f). These findings may be understood both under the nature of the trait (independent inheritance) and over the properties of the restricted indices.


(continued)


10.1(continued)

Table

Fig. 10.1 Average genetic gains in 100 Monte Carlo replicates for traits with low and high heritability (h<sup>2</sup> 0.2 and 0.5) and independent along cycles 0–50 of a simulated selection given four indices, the linear phenotypic selection index (LPSI), the ESIM, the restricted linear phenotypic selection index (RLPSI), and the RESIM with positive (0.5) and negative (-0.5) correlations between the traits low h<sup>2</sup> and high h<sup>2</sup> . (a) Gains for the trait with low heritability when it is negatively correlated with the high heritability trait. (b) Gains for the trait with low heritability when it is positively correlated with the high heritability trait. (c) Gains for the trait with high heritability when it is negatively correlated with the low heritability trait. (d) Gains for the trait with high heritability when it is negatively correlated with the low heritability trait. (e) Gains for the independent trait when the other traits are negatively correlated. (f) Gains for the independent trait when the other traits are positively correlated

As the third trait becomes independent from the others, there are no indirect effects owing to the constraints in the gains of the other traits. With regard to the technical features of the RESIM, it must be emphasized that because of the eigen decomposition, the largest eigenvector obtains higher weight from the most variable trait and consequently ends in distinct gains, which in this case is the independent trait.

The Smith (or LPSI) and ESIM produce similar genetic gains for highly heritable traits when the genetic correlations are positive (Fig. 10.1d). The ESIM is simply another way of obtaining the LPSI based on the eigen decomposition theory, which avoids the assignment of economic weights. Thus, the results prove that the same results may be found with both indices. However, the ESIM is the preferred index because of its advantages over the LPSI: no subjective decision for selecting economic weights, and better statistical sampling properties.

When the traits are negatively correlated, the trait with greater heritability has shown important realized genetic gains based on the ESIM and similar gains for the LPSI and its restricted analogous, i.e., the RLPSI (Fig. 10.1a and c). In addition, when traits are negatively correlated, restricting the traits with low heritability is an alternative, to ensure similar progress to the use of unrestricted indices for highly heritable traits. On the contrary, it is also interesting to note that the ESIM has the worst performance when the traits are negatively correlated for trait with lower heritability (Fig. 10.1a).

On the other hand, as already pointed out, the ESIM performance surpasses all the others with regard to the highly heritable trait (Fig. 10.1c and d). The reason for this is similar to the above-mentioned regarding the properties of the eigen decomposition. When the first trait is negatively correlated with the second one, heavier weight is given to the trait with higher heritability than to the trait with low heritability. However, when the traits are positively correlated, synergic and indirect effects increase both traits, one positively affecting the other.

When the traits are positively correlated but with low heritability, the LPSI and the ESIM have similar realized genetic gains until cycle 25; after this selection cycle, the LPSI is superior to the ESIM (Fig. 10.1b). In this case, the two restrictive indices, the RLPSI and the RESIM, are given lower realized genetic gains than the LPSI and the ESIM (Fig. 10.1b). Finally, considering the third trait (the independent one), the RESIM provides the greater realized genetic gains (Fig. 10.1e and f).

#### 10.2.2 Genetic Variances

In Fig. 10.2, the average relative decreases in the genetic variances along the 50 cycles of selection for the three traits (with low and high heritability traits in addition to the independent trait) under the selection system given by the four selection indices, restricted (the RLPSI and the RESIM) and unrestricted (the LPSI and the ESIM), both with negative and positive correlations between the first and second traits.

As a general result, it is clear that after selection there were decreases in the genetic variance along the recurrent cycles (Fig. 10.2). From the most conservative decrease (around 40% in Fig. 10.2a and b) to the sharp decrease (close to 10% in Fig. 10.2e and f) and in contrast to the trends in genetic gains, it is possible to conceive that the genetic variability was not yet exhausted by selection. This observation endorses what was said regarding the effectiveness of the selection indices as a criterion for long-term multi-trait selection.

As expected, the restricted indices are more conservative, maintaining greater genetic variance (Fig. 10.2). Their feature is to prevent the restricted trait from changing its genetic merit. Thus, they tend to keep its genetic variance unchanged,

Fig. 10.2 Average genetic variances in 100 Monte Carlo replicates for traits with low and high heritability (h<sup>2</sup> 0.2 and 0.5) and independent along cycles 0–50 of a simulated selection given four selection indices, the LPSI, the ESIM, the RLPSI, and the RESIM, with positive (0.5) and negative (-0.5) correlations between the traits low h<sup>2</sup> and high h<sup>2</sup> . (a) Genetic variance of the low heritability trait when it is negatively correlated with the high heritability trait. (b) Genetic variance of the low heritability trait when it is positively correlated with the high heritability trait. (c) Genetic variance of the high heritability trait when it is negatively correlated with the low heritability trait. (d) Genetic variance of the high heritability trait when it is negatively correlated with the low heritability trait. (e) Genetic variance of the independent trait when the other traits are negatively correlated. (f) Genetic variance of the independent trait when the other traits are positively correlated

which is reflected in the lower decreases in the genetic variance, even under the indirect effects of the other traits.

It should be noted that there was a slight increase in variance in the short term (up to cycle 3) for the trait with lower heritability when negatively correlated with the highly heritable one (Fig. 10.2a and b). This is an outcome of the changes in allele frequencies of the first trait due to the indirect effects of the second trait and/or the release of genetic disequilibrium owing to the assortative mating of the individuals given higher weights regarding the second trait (highly heritable).

Reflecting the findings regarding the genetic gains (Fig. 10.1), the trait with strong decreases in genetic variance on average was the one in which the response to selection was more pronounced, i.e., the independent trait (Fig. 10.2e and f). This trait has shown stronger decreases over the selection through the ESIM index in both positive and negative correlation scenarios. As mentioned before, as the third trait is independent of the others, a greater response to selection was achieved in that trait and consequently strong changes in allele frequencies, which drove the decreases in genetic variance.

When the heritability is high, it is easy to differentiate the trends in the decrease in the genetic variance between restricted and unrestricted indices (Fig. 10.2c). It is more evident, especially when the traits are positively correlated (Fig. 10.2d). Thus, the ESIM has the highest decreases followed by the LPSI. Nevertheless, for the traits with low heritability, the decreases in genetic variance are indistinguishable between the indices, showing that the effectiveness of the response to selection is a function of the heritability (Fig. 10.2a and b).

#### 10.2.3 Selection Accuracy

The accuracy of the selection was measured as the square root of the correlation between the net genetic merit and the estimated linear function of each index. Figure 10.3 shows the absolute accuracies (left axis) and relative values in relation to the mean accuracy of the first cycle (right axis) for all indices in both negative (Fig. 10.3a) and positive (Fig. 10.3b) correlation scenarios.

In all cases, a reduction in the selection precision of all the indices was observed. The effect of selection is the improvement in the genetic merit of the traits by means of changes in allele frequencies that also affect/decrease the genetic variance. However, as a side effect, the selection becomes harder and has lower precision.

The LPSI has shown greater accuracy in comparison with the other indices in any situation (Fig. 10.3a and b). Its main feature is precisely maximizing the correlation between the net genetic merit and the linear combination of the trait. It may be

Fig. 10.3 Average absolute and relative accuracy of selection in 100 Monte Carlo replicates for traits with low and high heritability (h<sup>2</sup> ) and independent along cycles 0–50 of a simulated selection given four selection indices, the LPSI, the ESIM, the RLPSI, and the RESIM with positive and negative correlations between the traits low h<sup>2</sup> and high h<sup>2</sup>

argued that the ESIM also does that; however, only when the phenotypic and genotypic variances and covariances are known are they the best linear predictors. Thus, according to what was found, it is possible to note that the ESIM was more affected by the sampling properties when estimating matrices of variance and covariance (Fig. 10.3a).

For the scenario with positive correlations, the differences between the two types of indices, the restricted ones and the unrestricted ones, were clear, as the unrestricted indices have shown greater selection accuracy (Fig. 10.3b). This reflects the fact that the restricted index constrains the gains by means of restrictions in the correlation between the net genetic merit and the linear combination of the traits.

#### References

Eddelbuettel D (2013) Seamless R and C++ Integration with Rcpp. Springer, New York Haldane JBS (1919) The combination of linkage values and the calculation of distance between the loci of linked factors. J Genet 8:299–309

Hazel IN (1943) The genetics basis for constructing selection indexes. Genetics 28:476–490 Kempthorne O, Nordskog AW (1959) Restricted selection indices. Biometrics 15:10–19 R Core Team (2017) R: A language and environment for statistical computing Smith HF (1936) A discriminant function for plant selection. Ann Eugenics 7:240–250

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Chapter 11 RIndSel: Selection Indices with R

Gregorio Alvarado, Angela Pacheco, Sergio Pérez-Elizalde, Juan Burgueño, and Francisco M. Rodríguez

Abstract RIndSel is a graphical unit interface that uses selection index theory to select individual candidates as parents for the next selection cycle. The index can be a linear combination of phenotypic values, genomic estimated breeding values, or a linear combination of phenotypic values and marker scores. Based on the restriction imposed on the expected genetic gain per trait, the index can be unrestricted, null restricted, or predetermined proportional gain indices. RIndSel is compatible with any of the following versions of Windows: XP, 7, 8, and 10. Furthermore, it can be installed on 32-bit and 64-bit computers. In the context of fixed and mixed models, RIndSel estimates the phenotypic and genetic covariance using two main experimental designs: randomized complete block design and lattice or alpha lattice design. In the following, we explain how RIndSel can be used to determine individual candidates as parents for the next cycle of improvement.

#### 11.1 Background

The linear selection index theory (see Chaps. 2 to 9 for details) can be difficult to apply without the use of specific codes developed in statistical analysis system (SAS) software. At the International Maize and Wheat Improvement Center (CIMMYT, for its Spanish acronym), codes were developed in SAS software version 9.4 (SAS institute 2017) that can help to determine individuals as parents for the next selection cycle. The SAS codes can be found at the following link: https://data.cimmyt.org/dataset.xhtml?persistentId¼hdl:11529/10242.

G. Alvarado · A. Pacheco · J. Burgueño (\*) · F. M. Rodríguez

Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico, Mexico

e-mail: g.alvarado@cgiar.org; J.Burgueno@cgiar.org

S. Pérez-Elizalde

Departamento de Socioeconomía Estadística e Informática, Colegio de Postgraduados, Mexico, Mexico

J. J. Céron-Rojas, J. Crossa, Linear Selection Indices in Modern Plant Breeding, https://doi.org/10.1007/978-3-319-91223-3\_11

Afterward, the SAS codes were translated to R language as scripts (Pacheco et al. 2017) and denoted by RIndSel (R software to analyze Selection Indices), with the objective of creating a user-friendly graphical unit interface (GUI) in JAVA. The link to download the software is: https://data.cimmyt.org/dataset.xhtml? persistentId¼hdl:11529/10854.

#### 11.2 Requirements, Installation, and Opening

RIndSel is compatible with a Windows platform, in any of the following versions: XP, 7, 8, and 10; furthermore, it can be installed on 32-bit and 64-bit computers. To install RIndSel on a computer, the user must double-click on the executable file downloaded over the link given above and then follow the instructions that appear in the installation box. Once RIndSel has been installed, it can be opened by:


As we shall see, the software has been partitioned into two modules.

#### 11.3 First Module: Data Reading and Helping

This module (Fig. 11.1) deploys two small boxes upper left denoted by "Open File" and "Help." With Open File, the user may access a set of files where he/she can open, for example, the file of phenotypic data, which should contain information


Fig. 11.1 Module for reading data


Fig. 11.2 Steps for saving a comma delimited file

associated with the experimental design. This file contains information about the field book where the experimental design variables can be identified in the first columns, whereas the remaining columns contain information about traits measured in the field; design variables and traits are connected by the plot number. Previously, the data set should have been captured in a spreadsheet using Excel or any other similar software and saved as a comma delimited file. To save the data as a comma delimited file in Excel, the following steps should be taken. In the Excel file that contains the data set (Fig. 11.2), select from the main menu: FILE ! Save As ! Browser View Options (look for the path were the data will be saved) ! Save as type (look for CSV, comma separated values). The end of the file name should be ". csv," indicating that the file is ready to be used.

The small box "Help" (Fig. 11.1) shows basic features such as the installation manual and software licenses. The installation manual provides a brief description of the selection indices that can be calculated and the pathway to where the software is located (Fig. 11.3). Furthermore, it shows folders related to the software features such as how the software could be used. There is also a folder called "Examples," where the user can find data for test phenotypic selection indices, selection indices of coded score markers, and wide genome selection indices. The folders "Lib" and "Programs" contain information related to the software functioning; therefore, the authors highly recommend not modifying these folders.

Fig. 11.3 Tree diagram of the RIndSel structure

#### 11.4 Second Module: Capturing Parameters to Run

Once the data have been read (first module), RIndSel moves to the second module (Fig. 11.4), where some feedback is required:


This module is structured in such a way that calculating any selection index is relatively easy. There are three other small buttons located upper left of the module: "Back," "Analyze," and "Help." Back returns to the previous module (Fig. 11.1), Analyze executes and calculates the selection index, and Help provides the same functions as described in the previous section. In addition, there are four windows, each of which must be filled with the correct parameters. The first one is related to the indices that RIndSel is able to calculate (Fig. 11.5).

#### 11.5 Selection Index

In this menu, it is necessary to define the percentage of genotypes that will be selected. By default, it is 5%, but any other percentage can be chosen. RIndSel uses the correlation matrix or the variance–covariance matrix to obtain the index; however, by default, the variance–covariance matrix is used. To work with the correlation matrix box, "Correlation" should be checked. The sign for "economic weights"


Fig. 11.4 RIndSel module of analysis

can be used to determine the behavior of the expected genetic gain of the traits. For example, with -1, the mean of the traits tends to decrease, whereas with 1, it increases. It is also possible to use the trait heritability. The economic weights can be assigned by creating a comma-delimited file with the name of the trait and economic weight sign (Fig. 11.6a). Once the file has been created, it can be browsed by pressing the open button and where the \*.csv file is located (Fig. 11.6b).

To calculate the restricted linear phenotypic selection index (RLPSI or K&N, see Chap. 3 for details), it is necessary to create the same file and incorporate an additional column called "Restrictions." This last column must be filled with the number one for those traits that remain fixed (restricted) and zeros for those traits that change (Fig. 11.7). An additional option is to ignore the "Weights" box, which means that RIndSel automatically presents an Excel file covering the options for capturing economic weights; the only requirement is that the file must be saved as a comma delimited file.

#### 11.6 Experimental Design

The menu allows the user to select the field array design to be used. There are two choices:


Fig. 11.5 Flow diagram of the selection indices that RIndSel is able to calculate; <sup>1</sup> Smith (1936), 2,3 Cerón-Rojas (2008a), <sup>4</sup> Lande R, Thompson R (1990), <sup>5</sup> Cerón-Rojas (2008b), <sup>6</sup> Cerón-Rojas (2015)


Fig. 11.6 Example of content for (a) economic weights of (b) file location

Fig. 11.7 Economic weights for restricted selection indices

#### 11.7 Variable Selection

Experimental design is strongly related to the "Variable Selection" menu, where it is possible to identify the variables that constitute the experimental design. Thus, we can choose variables that match with the "Location," replicate for random complete block design and block, provided that we have a lattice or alpha-lattice experiment.

#### 11.8 Response Variables

In this menu, the user can select traits to be used to calculate the selection index. It can be activated by clicking on the trait to be selected. Figure 11.8 shows an example of how this window must be filled when a Smith phenotypic selection index is calculated.


Fig. 11.8 Example of parameters that could be used to calculate a phenotypic selection index

#### 11.9 Molecular Selection Indices

If the selection index to be calculated is molecular, such as the Lande and Thompson (1990) or the linear molecular selection index (Fig. 11.9, and see Table 1.1, Chap. 1, for details), two additional files are required:


Marker scores can be obtained by making a regression of the phenotypic values on a codified molecular markers matrix (see Chap. 4 for details). The file can be created in Excel and must have the score with its respective marker for each trait; this file is saved with a .csv extension. An example of how these kinds of files must be generated is shown in Fig. 11.10a.

To calculate the scores in an F2 population, it is important for the molecular marker to have previously been codified as -1, 0, and 1 for genotypes aa, Aa, and AA respectively. When data come from an recombinant inbred line population, the molecular marker should be codified as -1 and 1 for homozygous genotype aa and AA respectively. In the genomic selection indices (LGSI) context (see Chap. 5 for details), it is only necessary to codify the molecular marker matrix (Fig. 11.10b), as these indices do not require a marker score.


Fig. 11.9 Example of parameters that could be used to calculate a molecular selection index


Fig. 11.10 Comma delimited files read in Excel for (a) scores of markers for traits plant height (PHT) and ear height (EHT), (b) a codified molecular marker matrix

#### 11.10 How to Use RIndSel

The use of RIndSel can be illustrated with an example from the Smith linear phenotypic selection index (LPSI) (Smith 1936, see Chap. 2 for details). Figure 11.11 shows the phenotypic data (Fig. 11.11a), together with the file of economic weights (Fig. 11.11b). Three simulated traits (T1, T2, and T3) described in Chap. 2 were used. T1 and T3 are positive (economic value ¼ 1), whereas trait T2 is negative (economic value ¼ -1). It is important to remember that all data files must be saved in comma delimited format (\*.csv).

After the data and economic weights files have been generated, the data need to be loaded into RIndSel; thus, it is important to be able to find the pathway to where the files are located (e.g., "C://Book/datafile/C1\_PSI\_05\_Phen.csv"). Once the data file has been located, it must be uploaded, which can be done by clicking on the file, causing it to automatically begin this process. It is then possible go to the second module (Fig. 11.12) and select subsequent parameters from the menus. In this case, Selection Index: Smith; Percent: 5; Weights: here we must look for where the economic weights are, for example "C://Book/datafile/C1\_PSI\_05\_Phen Weights. csv." Once this file has been located, it must be selected by clicking.


Fig. 11.11 Simulated data from Chap. 2 with (a) array in an alpha-lattice and (b) economic weights required to test the Smith linear phenotypic selection index (LPSI)


Fig. 11.12 Example of filling in a phenotypic selection index without restrictions

After the selection index windows are filled, the following menu is called: Experimental design, which allows the user to select the appropriate design – (for example, a lattice). To select the design variables, the user must navigate to the Variable Selection. In this example, the experiment has only one location, and the following should be selected: rep as Replicate, block as Block and entry as Genotype. An output name of the index must be assigned by writing its name in the Box Output folder, which is below the Variable Selection menu. For the Smith LPSI, the name chosen was SmithSimulated. Finally, the Response Variables menu should be filled by selecting the traits T1, T2, and T3.

#### 11.11 RIndSel Output

This section explains the structure of the RIndSel output. First, RIndSel presents the genotypic variance–covariance matrix and the phenotypic variance–covariance matrix (Table 11.1). In addition, when the selection index involves molecular data, RIndSel presents an additional molecular variance–covariance matrix, which contains the additive variability associated with the markers (Table 11.2).

RIndSel also presents a table with the estimated values of the index parameters (Table 11.3). These estimates are the covariance of the selection index, the variance of the selection index, the net genetic merit (breeding value), the correlation between the selection index and the net genetic merit, the selection response, and the heritability of the index (see Chap. 2 for additional details).

Additional results are presented in Table 11.4, which show the ranked selected individuals; this ranking was done as a function of the estimated selection index values. Table 11.4 also presents the means of the traits of the selected individuals; the means of the traits of the total population; the selection differential (see Chap. 2),



Table 11.3 Estimated selection index parameters given by the RIndSel output

Table 11.4 Values of the three traits for selected individuals and the values of the Smith linear phenotypic selection index, means and gains with k ¼ 5%



and the expected genetic gain per trait. Selected individuals can be identified by the first column called "rownames," as columns 2 to 4 contain the best linear and unbiased estimator for each mean trait. Finally, column 5 presents the estimated selection index values.

Comparison between means of selected individuals and all individuals is done by selection differential, where in general traits whose economic weight was 1 are positive, whereas those traits whose economic weight was -1 are negative. The expected genetic gain is an inferential tool based on normal distribution that depends on the percentage of selected individuals and gives the estimated index expected genetic gain per trait.

Finally, Table 11.5 shows the best linear and unbiased estimators for all individuals accompanied by its respective selection index. In this case, only the first 20 individuals were included. This table output is important, because on some occasions, it is necessary to determine the specific behavior of a group of genotypes that may not have a good performance, even though they have shown a good general performance from previous analyses. Another possibility is that a group of individuals belongs to a specific population group; thus, it is possible to select the best individual for this population group.

#### References


SAS Institute (2017) SAS user's guide: statistics module. Version 9.4. Ed. Cary, NC

Smith HF (1936) A discriminant function for plant selection. In: Papers on quantitative genetics and related topics. Department of Genetics, North Carolina State College, Raleigh, NC, pp 466–476

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.